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The 1963 Invlfetlonal Gonfereri on Testing Problenja centered up- 
on both thedretical and practical aspeds of meaiuremeniitSpeakers 
reviewed and analyzed die current thir?kii|g that underlies die basic 
concepta of norms; reliability^ and validity; The ,potehtial of cog- 
nitive and non-cognitive tests wa^ explored as were the social 
consequences of tests in generaL In contrast to these theoretical 
'discussions^ the Conference featured two iiiteresting reports on the 
appiication of objective tests, in the fidd of medical education. AH - 
' in all, it was a most stimulating prograni that balanced the reality 
of the present with Implications for the fuAire, , 

I should like to extend our thanks to Dr, Alexander G. Wesman . 
who, as Chairinanj was responsible for planning t^ls program. 
We owe our thanks also to Dr. Jerome S. Bruner for his lunch- 
eon addr^s and to the other distiiiguished so^akers whose efforts 
made this Conference such a success. f . 

Henry Chauncey 
pAesiden'T 



be designate 



4* 

.J 



r 



as chairman of yfc ETS^ Invitational Confor- 
en^ on T€stirig£PrdDlems^ aimuUaneously an h^or and an 
opportunUy. The fist of prior ckainhea is a hfgkty distinguished 
one; die excellence of previgjrfs^tograms Js.docum^|Ad 6y the 
conatanjcy of ingrease iii ittendance at^e (iftetings. UppO|tiinity 
provided by the freedprn given theTCn^m^n to compose the 
Vl^gram. as ,.he wisBieg iiadf by .the/regard iniwbich the oNpnferv. 
ence is held — a regard Whicb predisposes d^ir^ speakers to 
accept tfie chairman '| ^invitation. The ch|.^ni&n whfcb fails to .^ 
proVidef a stimulating meeting has^only himsfelf to blanir, if he- 
chooses wjsey, the speakpra will ftilfill his responsibilities toMife 
credit. HpRf ^ , ^ , ^ 

In ^^^izing %he . f§63 conferfacej nij design was to have 
basic condepts and concerns in the^field of ■ measurement pre^ 
sented coniprehensivelyj informed^ff and informatively. The first 
session was devoted to "state^T-tlie-sdience" overvfe^^ of three 
fundamental concepts — norms J| reliability, and validity. Dr,- Roger j 
T. EennoQ called for more lictive attention 'development oE 
nojming theol^j jummarized current norniiitg practices, ar 
u^'ed 'the esttblis|^menfr of a system which would permit co^ 
parable ' norms for tests whose priniary /standardizations are 
based on soniewffat difr^ring samples of the population. Dr. 
■Rob^^ L* * Thorrfflifee *cornniented on proposals, practices, and 
procedures, for estimating reliability; he, structured^ his discussion 
in te|ms of concept formulations, construction of mathematicaU 



'modys, and methods of obtaining perflnent data., The third fund*^ . 
t-ipirti^al test characterlstio,, vaUdlty, was discussed by^-^. Ahne 

Anaattfsh ^nder the toplo headings construct v aUd atlon^^ 
. aiision -theory, moder^or variables, synttfttlc vaMdlty, and; 
A resppnsl ityles, she revlewca new .approaches, devised^ for the 
itudy dfvaUdlty 'durlng the last ten years. ^ - 

The seconi ripmlng .session was devoted to thpfVep^ and 
a^jrajsal of test/use In a spilflc field of appllcaUon - m^lcine. ^ 
Dr. John P. Hubbard des^rfbfed at testing method used by the • 
Nittoi^arBoard of Medical Examlhers to appraise ttie dlagflos. 
tic coippetence of an'lntern,^Mploying a sequential, programmed 
pattern In, a realistic cUhlcatiituatioft Under th* title. "Alternate 
Criteria Medical Educatipn arid th^fr Correrates,''-Dc,'E. Lowell 
, Kelly reported an extensive investigation ^of predictor and criter.. 
dcfti variables of concern to medical schools. , . 

At the lundieou meeUng we were prlvilegofl to hear a most v 
. mterestlnifSdress %y m Jerome Brunef. His discussion of 

* Earning pVocesses and concept formation dwelopment was truly ' 
' i bighli^e- stimulating, scholtaly, and 1 

■:The three afternoow spjak'ers directed ou£ attention to impllca. 
.■tions^'and cohsequencW cr measurement. /rhe fi*st. Dr. Warren 
' G .Fmdley, discussed'/^rrfrft theory and aupllcation in cegni- . 
V , live fields = the lAppraisar of ability. He reviewed our changing 
} * approaches fo Investigating the structure and organiwition of 
mental ability, and out) contemporary m,ethods of appraisitig • 
. achievement in schools Jnd colleges. ^Jon-cognitlve aspects of ; 
student perfoi'marice wereltreated by Dr. San^el Messick, who , 
^considered tHe potential Jontribution of p^tsonality ^ssment 
techniques to the prediction of success, of college students. He 
' undAooi to Aise (and answer) questions as to scientific stand- 
. 4 a'rd^ for'^valuating personality vd^ic^s, And ' et|caL:problems 
in the use of such devices in practbal decision maEbg. The final 
speaker of the ^ay^ was Dr. Robeh C Ebel, whos? topic ws v 
* "Tlie Social" Consequences of Educationtiiyrestihg.''; Dr. Ebd 
etamined ijie 'charges recently ^almed at testintJjy synipalffetfc 

• 'ft'nd by antagonistic CTitics;;subj|^. the charges Jo Judloial con-- 
' slderattei}; accepted the validity of some, rejected the^idify 

♦ page VI .'4 < ■ 8 ' ' ' 
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' of oAers; and brought the several issues Into faner perspective 
In concluding remark^ on'^the^social cpnsequAces of ^df testing, 

^ my deep agpreciation to the comilnittee of previou^In^tatlonar 
C^nffirace chaLirmen which selected nlcp and to Educational 
Testing Sej^ice whi^ sponsored th^ meeting and supplied pro- 
*[ fessipnal Qjid practical assistance at evk'y stage of the'develop- 
Vment of the programf It was for me a m^st rewarding' ex perigee. 

^ ^ * 1 Alexander WSimafi 

^ . * CHAIRMAN 
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Several months ago, your Chairman extended'tQ me 4iis invita- 
tion, which' I was mbst pleased to accepf, to take part in today's 
proceedings. He said ihat _ he would like me to talk about 
. norms. *'A1,'* I said, '"that is a rather broad topic. Can you 
give me any hints as to which aspects of it you would suggest 
I concern mygelf withP'' By dint of patient questioning I was 
able to elicit from him. his hope that I would undertake a re- 
view^ of developments over Hhe past 15 or 20 years in norm= 
ing theory J norming technology: and related, areas, a survey of 
current practices with respect to norming varieties of types of 
tests, a critical analysis thcrepf, a prospectus for needed im^ 
provenient, and perhaps a prediction of future developments in 
the realm pi test norming— all, however, not to consume more 
than 20 or at the most 25 mitvutes. Then, like all good diair- 
men, he said, "But, of course, use your own discretion," neatly 
combining this passing of the buck with the subtle flattery of 
crediting me with possession of some discretion. \ 

I have found it convenient to organize my re^nrks under two 
topics, which I shall refer to as norming^^eory, on the one 
hand, and norming technology, on the other. 

As to norming theory, I shall have relatively little to say — 
and this for the best of reason s, namely, that the past decade 
has seen little development in this area. Where the literature 
abounds with theoretical treatment of validity and reliahihty, 
it is almost devoid of systematic treatment of norming; the 
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words "norms" and "normlng," for example, have even 
appeared In the Index of the Annual Review of Psychmgy for 
the past three years. 

Indeed, some of you may even wonder what I have In mind 
when I sRcak of "normlng theory." Surely, you wQl say, every- 
one knows -what norms are and why we need them; what more 
Is there to it than that? Perhaps I can make my nieaning clear 
by recalling that the admLnlstratlon of a test to an individual 
or a group can, In most Instances, be thought of as akin to the 
conduct of a scientific experiment. Performance on a test, when 
interpreted according to suitable norms, serves as evidence sup- 
portive or not supportive of a. hypothesis: this iwpil has or has 
not nMe progress In -reading durjng the past school year; the 
group using this textbopk-* has made significantly greater pro- 
gress than comparable students spending the same amount of 
Ume on this subject; etc. Now the inferences or conclusions that 
are drawn from this experiment-like testing are obviously con- 
ditloned by attirbutes of tlic normlng group; but we have little 
" In the way of a body of general principles relating test Inter- 
pretation to norm group characteristics, lltde spelling out of the 
relations between norms, let us say, and test .validity, little 
theory), in a word, of normlng. I shall go no further in develop- 
ing this concept; for purposes of this paper, sufUce it to report, 
as I did a moment ago, that the past decade has been product- 
' ive of ve^y little advance in this area. . 

But if It appears that the past decade hiiK^i^ disappointing 
with reaject .to advances Ln normlng theory, tlie picture with res- 
pect to normlng technology and current practice Is. a more 
encouraging one. I discern at least four lines of development: 
if ApplicaUons of sampling theory to test standardization, par- 
ticularly ^as reflected in the work of Frederic I^Drd, have pointed 
the way to tiiore efficient data-gathering designs. 
2. We have added substantially to our knowledge about com- 
munity and school system variables related to performance on 
achievement and general mental ability tests. Dr. Jack Merwln, 
some three years ago, reviewing the literature on community 
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and school characteristics related t© test performance^ found 
sgme eighty-odd, ^relevant studies; a decade ago, there was 
scarcely a score. The work of Dr. Flanagan and his associates 
in Prbject Talent has already eventuated in a wealth of infomia= 
tion about characteristics delated to performance on various 
tj^pfs of tests at the secondary school levels some corroborative 
of earUer findings, others* raising questions about^ certain as= 
sumptions hitherto widely acted upon in definition of norming 
populations, 

3. There is a general willingness on the part of the major test- 
making agencies to commit the resources required for adequate 
test standardization, at least with respect to their most import- 
ant test series. 

4. The major test publishers, s^eral years ago, began to give 
serious consideration to the use of a common anchor test in > 
norming their respective tests, as a device for heightening com- 
parability among the norms. This enterprise has moved forward 
less rapidly than it should have, a state of afFairs for which, I 
regret to say, I am as much responsible as any one individual. 

By way of documenting these points, and as introduction to 
additional points that I shall make, I ask you to bear^ith me 
while I read to you excerpts from the descriptions of the stand- 
ardization programs for six of the most widely used batteries 
of tests, 

TESTA |l 

"Basic procedure for ruling out bias was to select a stratified 
sample of communities on which to base the norms. Communities 
were stratified on a composite of factors which have been found 
to be related to the measured intelligence of children in the 
community. Each community which volunteered to ser\^e in the/ 
normative testing was evaluated with respect to the factors 
of: 1) per cent of adult illiteracy; 2) number of professional 
workers per thousand; 3) per cent of home ownerships; 4) 
median home rental value. On the basis of a composite of these 
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factors each community was classed as verv high, high, aver- 
age, low, or very low. All the pupils presei^t in each grade in 
the community were to be tested . . ." { 

TEST B V 

'"Schools in the norm sample were so chost|^ that the repre- 
sentation from each of nine regions is similar tb the proportions 
in the United States. At the inception of the program a random 
sample of all school superintendents in the country was chosen. 
The superintendents were asked if they were viilUng to partici- 
pate In a long-range standardization prograrh. The selection 
of schools was then random from all avallablb schools in the 



region, 

• 1 

TEST C 

"More in4ortant than the sheer number of Students tested, 
however, Is tl>e degree to which they udequatery represent the 
total national public school population at those grades. U. S..'^ 
school enrollment data 'were obtained showing distributions of 
students by geographic region. Apportionment according to 
community size within each geographic region was based on 
1960 census figures for the distribution of population among ■ 
communities of various sizes. Invitations to participate In the 
standardization program were then extended to appropriate 
school systems, so selected that the group as a whole would 
typify tlie national population. Elghty-fiVe school systems in 
thirty-seven states participated in the standardization program. 
All cooperating school systems were asked to test complete 
classroom groups from one or moxc schools so chosen as to 
be representative of the community, " 

TEST D 

"The total pupil enrollment in public elementary and secon- 
dary schools in the United States is the reference population 
' on which the norms are based . . . Data in the Biennial Survey 
of Education and general educational, social, cultural and/ 
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economic conditions were considered in grouping states with 
similar characteristics into geographical regipns. Specific charac- 
teristics considered were a.verage expenditures per pupil for in- 
structional purposeSj length of school term, and* type qf school 
organization. Community size was die second factor used for 
stratification coiHroh The norniing sauiples for all grades within 
a given level were Independent. Thus, any single school contrib- 
uted to oi>ly one grade for any single level of the test. No one 
school was permitted to contribute to samples for two successive 
grades^ even though they were for difTerentf levels of the test. A 
total of 672 school systems were contacted, of which 341 agreed 
and actifflUy did parlicipate in the norniing progran^ A^otal 
of 69,345 pupils in 48 states were tested In tins program/' 

' ' TEST E ' 

"*The norms purport to describe the achievement of pupils 'rep- 
resentative" pf the nation's public school popiUaticni, Autliors 
and publishers sought to obtain a norm group that would 
match the national school population with respect to certain 
characteristics known or assumed to be related to achievement. 
These characteristics include size of school system, geographical 
location, type of community, Intelligence level of pupils and 
type of system (segregated or non-segregated). Each field rep- 
resentative was asked to designate %0 school systems meeting 
specifications that wofuld yield a properly representative total 
norm group. A total of 225 systems accepted the invitation and 
carriecl through all riecessary phases of the program. Included 
in this group are public school systems from 49 states; the 
number of pupils tested in the standardization program was 
over 500,000. One additional control relating to age was 
exercised in the- selection of the final norm group. Pupils Tail- 
ing 6utside the 18 months range modal or typical for each grade 
were excluded froffi the norm group; the |>er cent of pupils thus 
excluded ranged ^froni 10 to 20, Participating systems were re- 
quired to test entire enrollrnents in at least three consecutive 
grades/' 
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TEST F ^ . 

''The population to which the norms apply includes ail students 
in grades 9 through. 12 in regular daily aftendarice at public 
high schools thrm^hout the United ^States. The sample on which 
the norms are based dr^n §o as <tOf Kefle?t the regional 

* distribution and the community-size distributions for the national 
pof ulaUon. A preliminary rample of school systems was chosen 
strictly at random from jeach pf tlie 36 strefta. The rtumber of 
systms choseo' ^fronj^ e'ach stratum was . based on the average 
higli school enrollment per grade^witliin that stratum/ This pre^ 
liihinary sample of 714 systems iucludtid approfcimately tliree 
times as many students as wcte demanded by the sample speci- 
fications. Invitations were issued to these 714 sclrooi systems,^ 
and over 200 school .systems responded. aiTirmatively, In muU 
.tiple-bailding systems either ft buildings or randomly selected 
buildings were includrf in the sample. All pupife in all grad^ 
in the cooperating schools were tested. A total, of 366 schools 
in 254 school systems participated in thc'standardlzatlon prq]ect.'' 

r do not cite 'thbse particular standardization projects as 
examples ^either good or bad practiqt in normmg tests; much 
less do I prdpose to criticize any features of pny one.of diese 
programs. I adduce .them rather as representative ol' current 
practice on the part of^ major test publishers with- respect to 
staildardization of their more iniportant test oiTermgs, The sLx 
exJerpts are, by intent, chosen from pubUcjitions of the six . 
mljor test publishers; the excerpts are^mosdy verbatim, but not 
complete; three are for achievement batteries, three for general 
ability tests. - ' 

It seems to nw quite clear from the descriptions that the 
norming in 'each of the instances cited must 1^' Judged to be a 
planful, earnest and informed attempt on part of the re^ 
spective authors and publishers to develop appropriate norms, 
kn efFort implying in every instance substantial commitment of 
time and resources, I may observe in passing that the author^ 
publisher expenditure is likely to be in excess df 40 or 50 cents 
for each case tested in the standardization of a group (and very 
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much higher In the case of an Individual) test and you dan 
readily appreciate the size of the commitment in these nori^ing 
programs^ involving as they did tens or even hundreds of 
thousands of pupils. It is jfo Idhger possible to say, as it might 
have been 20 years agcu that the norms represent adventitious 
coUectiohs of avafeble test scores bearing only accidentp.! re- 
.lationship to an accurate description 4f the test performance of 
definable groups of pupils. At least ^ith respect to the tests in- 

tblved 'here—and' would colId:tively. representc,% large 

action of the testing done'in elementary and secondary schools 
= 3uch shortconiings as the norms ma^ possess, eltlier viewed 
individually or in illation to gne ariother^ do nbt stem from 
carelessness^ lack of sophistication, or unwillingness to devote^ 
the resources needed to do Tcspectahlf norming. ,^ 

But shortconiings are ,iii evidence; the norms do leave nilich 
to be' desired,, at least when viewed across tests, Tlirer are dis- 
cernible marked diflferences with respect to the population whose 
achievement or ability the norraMPurport to delci^be; the vari- 
ables considered important as qpitifying variables; sampling 
^ procedures; /the proportions of voluntary cooper atidn forth cpni= 
Jng; the degree of control over administration and scoring; and 
other critical characteristics. It is inlpossible to state on a prion 
grounds the*efiect that such difierences may have in introdrfcing 
Systematic^ variations among the several sets of nornis^ bWt there 
are goo^ireasons for supposing that the differences in norms 
ascribabre simply to these variations in jiprming procedures are 
not negligible. When we consider that to such diffeiiences from 
^ test to test, there must be added differences associated with vary- 
ing content, with the time at which standardization programs 
are conducted^ including the time of the school year, the issue 
" of comparability^ or lack of it, among the results of the various 
* tests may begin to be seen in proper perspective. Empirical data 
reveal that there may be v^^iations of as much as a year and 
a half in grade equivalent among die rcsuhs yielded by various 
achievement tests; variations of as much as 8 or 10 points *of 
IQ^. among various intelligence tests are, of course, by no means 
uncommon. 

-h k ^, 
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Some of you may feel that tftls lack of comparability among' 
'results of varloup tests, ir not' really, arjnatter of great concern ,1 
-that as long as a school oj schoor ^ysteni consistently uses . 
a given test or, test series, i it need not be too distressed that 
some other test or series would yield ^omewhat different .results. 
If therie be any smch among you, may I cite (or you ;i situa- ; 
tion presently prevaUing in the state of CalilQrnla, to .the dis- 
tress of both California educutors and the test imhlishe^s. The 
Californla,legisiature, In response to public clamor over the qual- 
ity of education in that state, enacted legfahition prescribing the 
administration of ahility and acmeveint^t' tests/ iri grades 5, 8, 
and. 11, on an auniinl basis, to' all public school piipils in the 
state' The State educatioh dfpartincrit issued impleiiieuting rcg- , 
iriations, which, in a whoUy laudable attempi to proside for a 
measure of locul autonomy in tlyu selection ol evaluation instru- 
ments, established tm approved Ust of about hall a dozen abihty 
tests and an equal number of achievement tests Irom which loea 
school districts might choos* the Instrunients to he used. School 
'districts are required to submit results \u the State edilaitiou' de- 
purtnu;ut, which, in mrn, is charged with the responsibility of 
preparing a summary of pup/l achievement for the state tor sub^ 
inisslnn to the state board of education and presumably to the 
■legishuure and public. Now iunigine die task that conhonts the/ 
state education departnieiit in attempting to combine into a single 
summary the results, nun-compaiable as they are known to be, 
Irom a variety of tests» ilow can this agency discharge this re^ 
sponsibihty and give to the legislature and the public a clear 
picmre of pupil auaiuments;' Must it undertake its own^study ot 
equivalence among die half du/en ur so uieasures;' -Ijus is .an 
expensive and complicated undertaking, tfte rv»ults ul whiclhwrn. (> 
in any case be subiect to seruius limitations. Must it resort to- the 
alternative of requiring use of the same instrument by aU school 
districts? ! lor one could consider diis to be undesirable on ^var- 
ious educational grounds. ■ 

Is this a state of anairs that we in testing should be willing 
to accept with complacency? I do not believe we should^ I do 
not believe we have to. To do so, in. my opinion, is to court 
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Neither do^. I mink, as da^soM withwi^^wlthoufjhe testing 
field,^ that theae vexing ^pjoblems%df norming should prompt us 
to repudiate thtf/notion^f rtfetional nerms as^*an unattainable, 
unrealistic, antf^ meanmgless goal. For bbth gen£rfil mental 
ability or schfolastic ajptltud^'Inieasures Mid for ^ch^vem^t tests, 
th^ is surely a"pl4ce and a neeil^ for a single,' comprehensively* 
baseS, broadly dmcriptive set of norms, whatlver additional I 
""nee^l may also e^ist f6r data descriptive of particula'r Samples 
of 'the .general pipulatron. Rather, the proper -"directicp for us 
.'now, io = take v^ou^ seim to me to be along the pa^ of a col-' 
latorative. attack on' the norming "problen> by the major test- 
producing ^encies.= I think, each of u^%ublijhers shoi^ld be 
. wilhng to^sacrilice wHatey'er competitive advantage one or anodier 
of us may have felt he enjoyed by virtue f^tlie suprfior normft^ 
oF his tests, for the sake of the great gauis in fest interpretation 
that would flow from adoption of common definition! ofliorms.* 
populations and norming methods. We might even succeed in . 
f haying schools give pre-eniinenoe in selecting t§sts to eonsldera^ 
tions of content, validity and reliability. ^ 

EK.actly 23 years ago tliis very day, speaking in this very 
forujn,* Dn Curetori^ read a paper on liorms thAt has not, in 
my opinion, been surpassed by any subsequent paper on^ this 
issue. Cureton called for ji general adopytjn by test-inaking agen- 
cies^ 6^^a system of aitclioring their r|js|^*ctive tests a co^nnion 
sca^s fie ufged the developmen? of *'b%sic anchor test, its stand- 
ardisation on a^ genuine}^ rejjrcsentative" sampla of the general 
population, and the equating of intelligence tests and achieve- 
ment tfests of all publishers to this conunon scale. The attainment 
of this st^te of affairs would mark, in Cureton's words, *'the 
date oC maturity of educational and mental nieasurements as a 
science, and of educational guidance and counseling as a pro= 
Session." We are^ alas, not yet at this level of maturity^ by 
Cureton 's definition. ^ ^ 

While I have^no reason to suppose that any appeal that I 
might^make along tliese lines will be more potent than Dr. 
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Cureton's (sa^ only that the need for some such development 

is now far-m^e evident 'than it was in 1940), r*v©uld like to 
close •my remarks wltK a similar call to concerted, action now ^ 
' by the major test-making agencies. Surely -we now l^ow end^h 
^ about the dmracteristics of c©m_niuiiiti€S and school, systems-'W 
^ lated ta^'^peVfor^ance^on achievem.ent and mental 'ability mea- 
sures,' and are sufficiently close to common understanding pf 
h the proper general po^ufation on ^fuch to develop qorms, to 
^ enable us ^lo agree on a generally i acceptable definition q1 the 
popuiation Whose test perlbrmance^wrteck to describe; to specify 
the distribution of flils population on measurfi^ of economic stat- 
us, cultural status, educatimial effort and caliber of pupibpojAila^ 
,tion^ plus other demographic features to which normrng sampl^^ 
will be made to.confqrm; to push ahe.trPwith the creation of^an 
anchor Jnstrument' that will scrvu as a defining variable for all 
standardizatioingroups, at least for tests in thy* general cognitiv^ 
domain, and thus ti Bring ou^,qollective efforts to that level ol 
maturity for whieh^ Dr. Ciiretqn pleaded. As we value^the con-, 
cept of a science of Incasurenient of human abilities, let us take 
at leasts these sterp^s'^to make our efforts more deserving ol the 
^^her'scientific.'' ^ ' -\ > 
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It is Just 17 yfears ago that I had the honor^o&^ddj^ssing this 
august assemblage — soiiiewhat smaller and less inip©sing then 
than now — on "LrOgical Dilemmas in' the Estimatibn^M Relia- 
bility." I should have stopped when I was ahead! But^some evil 
genie brought his power to bear upon your program chairman 
for this jear, and, here I am let out of the bottle again to conv 
ment on developments that have occurred in thinking about and 
dealing with the topic of reliability over tlie fime span since last 
I held forth, I don't know whether^! am being used as a prac- 
tical example of the irfiportance of test-retest reliability' ©r as a 
demonstration of the fact that once ability to r^ad statistical 
exposition has reached a maximum in the late twenties it goes 
into a positively' accelerated curve of cjeclme from that time on. 
Fortunately ^ few of you in this room today have any recollection 
of what I said in 1946 — or are likely to remember beyond your 
second cocktail this afternoon what I say today. Unfortunately * 
my deathless prose will be preserved for posterity in the Proceed- 
ings oi the occasion. But for this there is no mitidote. 

The issues of test reUability may be approached^ it has seemed 
to me^ at three levels. The first of these is the verbal level of- 
formulffion and definition of the concept. A second level is that 
of mathematical model-buildingj leading to specification of a set 
of formulas and computational procedures by which the para- 
meters specified ^in the model are to be estimated. A third level 
is that of experimental data-gathering procedures, under which 
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certain tests are given to certain subjects at certain timef una 
treated in certain ways to yield scores that are the raw materials 
to which we^apply our formulas'and computational pfocedures. 

Developmerits in the past If years appear to hav\2 been pri^ 
marrly at the first ^two of these levels, ^In fact, Oscar Buros, 
addressing the Ajnerican EducaticSnal Research Association last 
year, ^expressed the view that the last 35 years have been retro- 
gressive, so far as our empirical procedures for appraising re- 
liability are concerned. He exhorted us to return to the virtuous 
' ways of our fbrefather|^and stick to the operation of testing the 
individual with two o^^ore experimentally independent tests, 
in order to get the data which permit generalizations about pre^ 
cision of measurement over occasions as well as over test items, 
and to this I can only say *'Amen^\ He urged us not to back^ 
sjide from, the high standards of precision that Truman Kelley 
laid down for us in 1927, and to this I would comn^nt "It -all 
depends/'' But my point is that I am not ^tware of any distinc^ 
tive proposals for new patterns of data-gathering^ that call tor 
our special attention today, though it Ms always well thm we 
be aware of the limitations of the methods we, are using. 

Turning now to verbal formu^itiqn^erhaps the major trend 
lias been toward increasingly explicit formulation, of the concept 
that -performance on a test should be thought ol as a sample 
fVom a defined universe of events, and that reliability is con= 
cerned with the precision with which the test score, that i^, the 
saniple, represents the universe. I shall not try to be a historian, 
but Avill merely note that this idea Ikis been made fairly explicit 
by Buros, by Cronbach, by Tryoii, and probably by others. 
What we may call die ''classicar^ approach to reliability tanded 
to be conceptualized in terms of suine unobservable underlying 
^Urue score'' distorted in a given i^ieasurement by an equally 
unobservable ''error of measurement." The corresponding math= 
ematical models and computational routines were procedures 
for . estimating the magnitude, absolute or relative, of this nieas^ 
urement error. The formulaticfli in terms of samphng does away 
in one lightning stroke with 'the mystical ''true score/' somehow 
enshrined fdr above the mundane world of scores and data, 
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^4 replaces it with the l^ss* austere "expected value" of the 
' ^ore in the population of values from which the sample score 
w,as drawn. , ^ , . * * ^ 

Now what are the ^implicatipnSs the advantages, and possibly 
'c^ t|ie limitations ^f ^ this ''sampRhg" cc^nceprton over the classic 
V*'true score and error'' concep^tion? - 
• For my self j I cannot say that the adyanf^ge' lies in simplifi- 
cation and clarification. This notion of a "universe of possible 
scores" is in many ways a puzzling and somewhat confusing ^ 
one. Of 'what is this universe composed? Suppose^we have given 
Fgrm A of ^ the XYZ Reading Test to the fifth graders in our 
school arid gotten a score for each pupil. Of what universe of 
scores are these scores a sample --of all possible scores that we 
might have gotten by giving, Form A on that day? Of all poss- 
ible scores^ that we might have gotten by giving Forn^ A some- 
timCj that month? Of all possible scores that might have been 
gotten By giving Forms A or B or C or other forms up to a 
still-uh written Form' K on %that day? Of scores on these same 
numerous and presumably ''parallqj" forms — and we shall have 
to ask what "paralleP- means under a sampling conception of 
reliability — at some unspecified date within the month? Of scores 
on the ^hole array of dlfTerent re^ading tests produced by difler- 
ent authors over the past 23 years? Of scDres on tests of some 
aspect of educational achievement not further specihed? 
; As soon as we try to conceptualize a test score as a sample 
from some universe, we are brought face to face with the very 
knotty problem of defining the universe from which we are samp- 
ling. But I suppose this, very difliculty may be in one sense a 
blessing. The experimental data-gatheriiig phase n{ estimating 
rellabiUty has always impjlft a universe to which those data 
corresponded. Split-half procedures refer only to a universe of 
behaviors produci^ at one single point in time, retest procedures 
» to a universe M Responses to a specific set of itenis, and so 
forth. Perhapr one j of the advantages of the sampling formula- 
tion is that it ma^es us more explicidy aware of the need to 
n which we arc interested, or to acknowledge 
ch our data, apply. Certainly, over the past 
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30 yedts, ^IL of us whov>have written for /^eiideats ana for the 
test'USing public have insistently hatred, upon the iionequiva- 
ience of differient operations for estiniatliig reliability^ a«d em- 
phasized the different universes to which differeilt procedures 
referred, . ^ ^ 

The notion of a nuidoni, sample from a universe ol responses 
seems mbsi satisfying and clear-cut when we are dealing with 
some unitary act of behavior, which we- score in some way. 
Examples would be distance Jvimj^d in a broad jump, time to* 
run* 100 yards, ^ speed of response on a trial with a reaction time 
device, or number of trials to learn a series of nfensense syll- 
ables to a specified level of mastery. In jhese cases, the experi- 
mental specification of the task is faiirly complete. Thus, for the 
100 yard run, we specify a sniootfi, straight, welUpacked 'cinder 
track, a certain type of starling Blocks, certain limitations on 
the shoes to be worn, a certain puttern of preparatory and start- 
ing signals, and a certain, procednre tor recording Ainie. A 
universe could then the universe of times for^a giv5n runner, 
over a certain span^ of days, weeks ur months of his running 
career. Data from two or more trials under these conditions 
would give us some basis for generalizing about the consistency 
of this behavior for this defmed universe. We could also extend 
the universe if we wished — to include wooden indoor tracks for 
example, or to include running on 'grass, or running in sneakers 
instead of track shoes — and *^sani pie randomly from this more 
varied universe. As conditions were varied, we might expect 
typical performance to v^ry more widely and precision to be 
decreased. 

We are usually interested in estimating precision for each of 
a population of persons, rather than just for some one specific 
person, and so we are likely to have a sampling from some 
population of persons. The nature of that population will also 
iiifluence estimates of precision, and so it will be important that 
the population be specified as well as the conditions. Precision 
of estimating time to run 100 yards is probably much greater 
for college track stars than for middle-aged professors -for 
whom one might occasionally get scores approaching infinity, 
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^But it would be possible to specHy the population of individuals 
^j||fairly satisfa^^ well as the population df behaviors for 

^^:^^,M.M^jm^f(^^^^fW^^ at least two-dimeasiQnal unive^aci 

* we could gmmpie in a prealimably random fashion; and we 
, ct^uld dien analyze pur saftiple of observations to yield estimatei. 

of the relative pieeislon widi wjiich a person could be located 
within the group or the absolute precision with which his time 
could be estimated in jeconds. 

' When we ar^ dra^ typical aptitude or achievement 

tests however, in whldi the score is some type of summation of 
scdr^ upon single items, the conception of the universe from 
which we have drawn a lample becomes a litde mote ftizzy, 
™- H^et^fatoly^^cleaiLyy^w^are cbncerned with sampling not only 
of respdfeses to a given situation but also of situations to be 
responded to. How shall we deflne^that universe? The dassical 
approaA to reliability tended to deal with this issue by postulate 
tog ai universe of equivalent or parallel tests and by limiting 
Ae universe frpm which our sample is drawn to this universe 
of parallel testi. Parallel tests may be defined statistimlly as 
' Aose having e^uaLpeanSi standtard deviations^ and correlations 
widi each other and with other variables. But they may also be 
defined in terms of the operations of construction, as tests built 
by the same prpcgdures to the same specificationa. If we adopt 

• the second definition, statistical characterisUcs will not be identi- 
^1, but the tests, will vary in their statistical attributes to the 
extent that different samples of items all , chosen to conform to a 
uniform blueprint or test plan wilhproduee tests with somewhat 

. differing statistical values, * y 

But some of the recent discaissions seem to imply a random 
— samplirig of tests from some rather loosely and broadly defined 
domain— the domlin of scholastic aptitude tests, or the dom^ain 
of reading comprehension tests; or the domain of personal ad- 
Justtnent inventorlfs. Clearly, these are very vague and ill- 
definrt domains. A sampling expert would be hard put to delimit 
the imlverse or to propose any meaningful set of operations 
for sampling from it, And in the realm of practical politics, I 
question w ever seriously undertaken to carry 
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out such a sampUng operation* One might argue diat the' data 
appearing in &e manual of the xfZ Reading Test, showing 
itt^CQCTilftttOWi pthe^ are an approx- 

imaU^ of su^ a domain sampling. But ho\^^ truly do the set 
of testa, taken coUectlvelyj represent a rmndom sampling from 
die whole domain of reacfing testsP^ One suspects that the tests 
select^ for correlating were chosen by the author tor pinblishet 
on some systematic and non-random basis-because they were 
widely used tests^ becauie data with respect to them were readily 
avc^ilablej or for some other non-random reason. 

We notej further^ diat as we broaden our conception of^the 
universe being sampled from that of all "tests made to a certain 
uniform - set - of -specifications to all tests of a certain jiblllty^ or 
personality domain, we begin to face the issue oLwhether we are 
still gettin^vidence on reliabihty or whether we are now getting 
evidence on^some aspect, of construct validity, But^ once again^ - 
perhaps we should consider it a contribution of the sampling 
approach that it makes explicit to us and heightens our aware- 
ness of the continuity from reliability to validity. Cronbach 
offers die single mm '^generali^ability'* to cover the whole 
• gamut of relationships from those within the most restricted uni- 
verse of near-exact replicatllns to those extending over the most 
general and broadly defmed domain and develops a common 
statistical framework which he applies to the whole gamut. 
Recognition that the same pattern of statistical analysis can be 
used whether one is dealing with the little central core, or with 
all the layers of the whole onion may be useful On the other 
hand, we may perhaps question whether this approach helps to 
clarify our meaning of ^^reliability" as a distinctive concept 
A third context in which the random sampling notion has 
been applied to the conceptualization of reUabillty has been the 
contact of the single test item. That is, one can conceive of a 
certain universe of test items -let us say the universe of vocab- 
ulary items J for example, A given test may be considered to 
fepfese^^^ drawn from this item universe. 

^^This„ conception provides the foundation for the estimation of 
tesf%llability: from th^^ of the items of the sample, 
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and thus to a somewhat more generalized and less restrictive 
form of the Kuder-Rldiardson reliability es^^ , 

But 'here^ agalns we encounter certain difficulties. These center 
on tRe^ one hand iiponMAe d^nition of the universe and on the 
odier upon the notion of randomness in samphng. In the first 
place, ^ere are very, definite constraints upon the items which 
make, up our operational^ as opposed to a purely hypothetical, 
universe. If we take the domain of vocabulary items as our ex- 
,ample, we ,can speplfy what some of these constraints might be 
in^an actual case Firstly^ there is typically a constraint upon 
the format of the Item — most often to a 5-choice multiple choice 
form. Secpndlys there are constraints imposed by editorial policy 
— exemplifl^ b^^ the decision to exclude proper names or special 
ized technical terms, or by a requirement that the options call 
for gross rad|er than fine discriminations of shade of meaning. 
'Thirdly, there are the constraints that arise out of the paj^icular 
idiosyncrasiea of the item writers: their tendency to favor p^^u- 
lar types of words^ or particular tricks of mislead construdtlSh, . 
Finally/ there are the constraints imposed by the item selection 
procedures— selection to provide a predetermined spread 9* item 
difficulties and to eUminate items failing to discriniinafe at a 
designated level. Thus, tlie universe, is considerably restricted, 
is hard Jp define, and the sampUng from it is hardly to be con- 
s id ered^lWBm . 

. Presumably we could elaborate and deUmit miore fully the^ 
deffnltion of the universe of items. Certainly, we could replace 
the concept of random sampling with one of stratified samplings 
and indeed Cronbach has proposed that the sampling concept 
be extended to one of stratified sampling. But we may find that 
a really adequate drflnition of the universe from which we have 
sampled will become/ so involved as to be meaningless. We will 
almost certainly find that in proportion as we provide detailed 
specifications for stratification of our universe of items, ^ and 
carry out our sampling within such strata, we are once again 
getting -very . close to a bill of particulars for equivalent tests. 
Just as random sampling is less efficient than stratified samp- 
=Ung, in opinion surveys or demographic studies, when stratifica- 
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tlon Is upon rtlevant variables, so also random sampling of test 
items Is Iras effldent than stratified sampUng in makln equivalent , 
tests. Analytical tech|ilques developed on the basis of random 
la^Iiig aasMipUonk will riftke a test appear less precise than 
it is as a representation ofa population of tests wHch sample In 
a u^fopn way from dLfrerent strata of the universe of items. It 
is p^^y in this sense that the Kuder-Rlchardson Formula 20 
and the other formulas that try to estljnate test reliabaity firom 
item data, or Irpm such test statistics as means and variances 
(which grow offl of Item ^ata), are lower hound estimates of rp- ' 
liability. They treat the sampling otitems as random rfther than 
stratified. They assume that differences in item factor composiUon 
either do not exist, or are only such' as arise by chance. 

Soiiittlmes the facts suggest that this may" be, approxlmately- 
the case. Thus, Cronbach compared the values that he obtained 
for tests divided into' random hah^es and those dlylded into 
judgmentally equivalent halves for a mechanical reasoning test, 
and found an average value of .810 for random spUts and .820 
for parallel splits. For a short morale scale the corresponding 
values were .715 and .737. But frequenUy a test is fairly sharply 
stratified -by difficulty level, by area of content, by intellectual 
process. When this is true, correlation estimates based on ran- 
dom sampling concepts may seriously underestimate those that 
would be -obtained between two Jferallel forms of the test, and 
consequently the precision with which a given test represents 
the stratified universe, 

These reactions to 'random samphng as applied to tests and 
test items were stimulated in part by Dr. Loevlnger's presidential 
address to Division 5 at the recent APA meetiijgs, and I gladly 
acknowledge the indebtedness, without holding her responsible 
for anything silly that I have said. 

The shift in verbal formulation to a sampling formulaUon is 
compaUble wjth a shift in mathematical models of reliability to 
analysis of variance and mtraclass correlation models. These 
. models hav,e, of course, been prpposed for more than 20 years, 
but 'they have been more systematically and completely expressed 
"in^e past decade, * 
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The most compFeheiisive and i3%tematic elaboraUon of dilB 
fonnflatiOQ of whi^ I am aware ia the one which has been 
dMributed in rexographed form Jby Oscar Buros, and whjch is 
avails Press. I confeas niy own 

Umitatioa w^en 1 say that I fincf this presentation pretty hard 
to foUow* HopAUy, others of you will be either more famiUar 
with or more facile at picking up the notation that Buros has 
used, and Will be able to pick &om the host of formulas that 
are offered ^e one that is appropriate to |he specific data with 
which you are faced. ^ . 

One great virtue of analysis of variance models ia their built- 
in versatflity. They can handle item responses that are scored 
0 or I5 trial scores that yield scores with some type of dontinu- 
oua ^iatfibution, or, where more than one test has been given to 
each individual, scores for total tests. They can deal with the 
situation in which the. data for each individual are generated by 
the same teat or the same rater and also the situation in which 
test or rater vary from person to person, This latter situation 
is one of very real importance in many practical circumstances. 
How shall we Judge the precision of a reported IQ^when we do 
not know whl^ of two or more forms of a test was given? How 
shall we appraise the reptetability of a course ^rade when the 
grade may be given by any one of the several different In- 
strurtors w^lP^andle a*' course? If we have more than a single 
score for each individual, even though the scores are based on 
different tests or raters for each individual, we can get an esti- 
mate of widiin-persons varlatioh. And whenever we have an 
estimate of widiin-persons variation we have a basis forjudging 
die precision of a score or rating as describing a person. Clearly, 
with only two or three or four scores per person, the estimate of 
wlthin-persons variance is very crude for a single individual. 
We must be willing to assume that the within-persons variance 
is sufflciendy iiniform from person to person for a poolintf of 
data over persons to give us a usable common estimate of vari- 
a nee from test to test fo r ^ch s ingle In d i v id u aL Ha v ing s uch 
an estimate, we can express reliability either as die precision 
of ^ score for an individual stated in absolute terms orjs the 
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ptecMon of pkcemeit of an iridividual relative to his fellows. 

As various writers have shown, convehtional Kuder- 
KidiMdson formulas eniCTge as special cases of the more gen- 
, riaT vMiance a likewise, the adJuJbient of 

correlaUorial^ measures of reliability for test lengtfi are derivable 
from gma^ vaxjiiice arialyiis formulas. 

I shall not try to recite to you a set of formulas today, be- 
cause this would serve no good purpose. Rather, let me direct 
' ' you to Tryon's 1957 article in die AyGAo%fcafJi^&^n, Euros' 
available if unpublished material, and Cronbach'? forthcoming 
article in die Bmsh Journal of StaHsHcal Psychoh These, 
^ plus Horsts- and Ebers arUcles in Psyc^metrika should give 

you aU the formulas you can use. 
. ... you ti^e queition of how much^ 

you are wilUng to pay for precision in a ^given measurement 
The ^ost is partly one of time and expense. But, given ,5^e 
fixed limit on time and expense, die cost cant then be a cost ip 
scope and comprehensiveneis. We can usually make gains in 
precision by increasing the redundance and repetitiveness of sue-. 
cessive observations. The more narrowly a universe is defii>e4, 
die more adequately a given lengdi of test sample cai^ repre^ 
sent it. With all du% respect to the error of measurement, we 
must recognize that it is often die error of estimate that we are 
really interested in. To maximize prediction of soclaily usrful 
events, if may be advantageous to sacrifice' a little precision in 
order to gain a greater; amount of ^ope. Precision and high 
rehability are, after all, a means rather than an end. 
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Some Current 

In th# MeasuramenI 
and Inlar prelallon 
' of T#8l Validity 



^^Wlthta^^e^pasr tfeeatfei p have been especially active 

in devising novel arid imaginative approaches to fest validity. 
In die tin^e allottedi I can do no more than whfet your appetite 
for Aea^ hope that you wUl be stimu- 

lated to examine the sources cited for an adequate exposition 
of each topic. I have seleAed five developments to bring to your 
attention. Ranging in scope from broad frameworks to spedflc 
techniques and from highly theoretlcaf to immediately practical, 
these topics pertain to: construct validation^ decision theory, 
moderator variables^ synthetic validity, and response styles. 

Obnatrubt Validation 



It is nearly ten years since the American Psychological Associa- 
tion published its Technical Mecommmdaiions (1) outlining four 
types of validityi content, predictive, concurrent, and construct 
As the most complex, inclusive, and controversial of the four, 

_ cpnstiTiCt yaUdlty has received the greatest attenUon during the. 
subsequent decade. When first proposed in the Technical Recom- 
mmdq^ons, construct validation was characterized as a valida- 
tion of the theory underlying a test. On the basis of such a 
theory J specific hypotheses are formulated regarding the expected 

.-varlallQns Jto test scores among, individuals or among conditionSj 
and data are then gathered to test these hypotheses. The con- 

=struds in construct validity refer to postulated attributes or traits 
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I ' that We pMsinnably refleded In test performance. Concwned , 
ifrlih a more comprehensive and more abstract kind of be- 

datlon, construct vaUdatlon calls for a continuing accumulaUon 
of mformaUon from a variety of sources. Any data, throwing 
light on &e nature of the trait under consideration and the con- 
dltions affecting its development and manifestations contribute 
to die process of construct vaUdaUon. Examples of relevant pro- 
ceduris include checking ari* intelligence test for the anticipated 
iii?rease in score widi agr*uring childhood, InvestlgaUng the 
effects of experimental variables siich as stress upon test scores, 
and factor analysing the test along wldi other variables. 

- -Stibsequently the concept consmict vaUdlty, has been at- 
tacked, clarified, elaborated, and ' aiustrated in a number o 
thoughSil and provocauve articles by Cronbach and Meehl ^ 
(14) Loevinger (30), Bechtoldt(6), Jessor and, Hammond (28), 
CampbeU anJ Fiske( 11), "and' Campbell (10). In Uie rnost recent 
of these papers, Campbell (10) integrates much that had pre- 
vlously been written about cmistruct validity and gives a well- , 
balanced presentation of its contributions, hazards, and common 
misunderstandings. Referring to the earlier paper prepared 
jointly with Fiske (11), Campbell again points out that In order 
to demonstrate construct vaUdlty we need to show not only that 

^ test correlates highly with other variables with which it should 
^^orrelate but also that it does not correlate with variables from 
which it should differ. The former Is described as convergent 
' validation, the latter as discriminant Validation. 

In their multitralt-muUlmethod matrix, Campbell,and Fiske ( 1 1 ) 
proposed a systematic experimental design for this twin appn>ach 
to validatfon; Essentially what 4s required is die assessmem ot two 
or more traits by two or more methods. Under diese conditions, . 

- the correlations of the same trait assessed by dlfrerent methods 
represent a measure «f convergent validity, (these correlations 
should be high). The correlations of dlffereht traits assessed by 

— theTSSiS orstmllar methods provide a measure of discriminant 
vaUdlty (these correlations should be low or negligible). In add- 
Mon the correlations of the same trait Indepehdendy assessed bjt 
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the 

samemetodgive an index of reliability* 

Wl^out attempUng an evaluation of consttuct valldityj for 
j^hi^^ to coosult the iources dtedj I should 

neverUnd^i W&t to make a few comment about it. First, the 
basic idea of construct validity is not new. Some of the earlier 
tests were designed to measure' such theoretical constructs as 
attenttpn and memory, not to menUbn that most notorious of 
constructs, "intelligence." On ^e odier hand, construct validity 
*^has served to focus attention on the desirability of basing test 
constructidn^ on an explicitly recognized theoretical foundation. 
Both in devising* a new test and in setting up procedures for Its 
vaUdatiori,^^e investigator Is urged to formulate psychological 
-Jhj^Qthf 3i5S#J'he_ prppjon^ have dius tried 

to integrate psycHological testing more closely with psychologi- 
cal theory and exp^lmental methods. 

With regard to spediflc validation procedures, construct valid- 
pLtion also utilizes .much diat is not new. Age differentlatibn^ 
.factorial validity, and the eflfect of such experimental variables 
as practice on test scores have beeri reported in test manuals 
long bdFore constt^^ validity was given a name In the Teck- 
yiical Refommmdandhs. As a matter of fact, the methodology 
of construct validity is so comprehensive as to encompass even 
the procedures characteristically associated with other types of 
validity (see 2, Ch. *6). Thus the correlation of a mechanical 
fapdtude itest with subsequent performance on engineering Jobs 
would co&tribute to our understanding of the construct meas- 
ured by this test Similarly, comparing the performance of 
neurotics and normals Is one way of checking the construct 
vaUdlty of a test designed to measure anxiety. Nevertheless, con- 
itrua validation has stimulated ahe search for novel ways of 
gathering validation data. Although the principal techniques 
currently employed to Investigate construct validity have long 
bttn familiar, the field of operation has been expanded to ad- 
mit a wider variety of procedures, ^ 

The very multipllcity^of data-gathering techniques recognized 
by construct validity presents certain hazards. As Campbell .puts 
it, the wide diversity of acceptable validatlonal evidence '*makes 
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possible a highly opportunUtic selection of evidence and the edi^ 
torlal d^lce of falling to mention validity probes tHat were not 
cpnfirmatoiy" (10, p.- 551). Another hazard stems from mis- 
understahdirigs of such a broad and loosely defined concept as 
construct validity. Some test cdnitructort apparently interpret 
construct validation to mean content validity expressed in terms 
of psyAological trait names. Hence they present as construct 
validity purely subjective accounts of what they believe (or hbpe) 
their test measures, 

it is also unfortunate that the chief exponents of construct '^ 
vaHdity stated in one of their articles that this type of valida- 
'tlon "is Involved whenever a test is to be interpreted as a ineas-^ 
_ure of some attribute or quality which is not 'operationally 
defined'" (14, p. 282). Such. an assertiori opens the door wider 
for subjective claims and fuzzy thinking about test scores and 
the traits they measure. Actually die theoretical construct or 
trait assessed by any test can be defined in terms of the opfra-' 
tions performed in establishing the validity of the test. Such^a 
deflniUon should take into account the various external criteria 
with which the test correlated signiflcandy, as, welL as^ ^le condi- 
tions that affect its scores. These procedures are entirely in 
accord with, the positive contributions pf construct validity. It 
would also seem desirable to retain the concept of criterion in 
construct validation, not as a specific practical achievement to 
be predicted, but as a. general name for independently gathered 
external data. The need to base all validaUon on data rather 
"than on armchair, speculation would thus be reremphasized, as 
would the need for data external to the^ght scores themselves, 

Daelslon Theory 

Even broader than construct validity in its scope and implica- 
tions 'is the application of decision theory to test construction 
and evaluation (see 2, Ch. 7; 13; 25). Because of many techniv 
cal comple^cities, however, the current impact of decision theory 
bn test development v^and use is limited and progress has been 
slow. 

^"Statistical decision theory was developed by Wald (37) with 
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special reference to Ae decisions required in the inspection and 
quality control of industrial products. Many of its possible im- 

tesUng ^have been system aticElly 
woHed 15ut %y Cro'ntach and Gleser in their 1957 book on 
Bsychohgical Tests and Bersonnei Decisions c(lS% Esaentially, 
decision theoiy is an attempt to put the decision-making pro- 
cess tato madiematical form, so that available information may 
be used to reach the niost effective decisions under specJfled cir- 
cumstanpesv The mathematical procedures required, by decision 
theo^ are often quite complexj and few are In a form permit- 
ting their imn\edlate application to practical testing problems. 
Some of the basic concepts of decision theory, however, can help 
_ln_the_rrfQ and clar^cation of certain qu^tioijs about 

4ests. 

A few of these concepts were introduced in psychological test- 
ing before the formal development ^f statistical dilcision theory 
and were later recognized as fitting into that framework. One 
example is provided by the weIi4nown Taylor- Russell Tables 
(36), which permit an estimate of the net gain in selection ac- 
curacy atelbutable to the use of a test The .information required 
for this purpose includes the validity coefficient of the test, the 
selection ^atio, arid the prpportion of successful applicants se- 
fleeted without the use of the test. The rise in proportion of 
successfol applicants to be expected from the introduction of the 
test is taken as an index of the test's effectiveness. 

In many situations, what is wanted is an estimate of the effect 
of the test, not on proportion of persons who exceed the mini- 
mum performance, but on the over-all output of the selected 
group. How does the level of criterion achievement of the per- 
sons selected on the basis of the test compare with that of the 
total apphcant sample that would have been selected without 
the test? Following the work of Taylor ^nd Russell, several in- 
vestigators addressed themselves to; tliis Question. It was Brogden 
(8) who' first demonstrated; that the exjiected increase in output 
or achtevement is directly proportional fo the validity of the 
test. Doubhng the validity of the test will double the improve 
ment in output e>£p^ from its use. Following a similar 
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apprMch (see 27), Brown and Ghlselll (9) prepared a table , 
whereby mean standard criterion score of At selected group 
can, H eittoatc4fcom a knowledge of test validity and selection 

latio. ^ ' , 

Decision Aeory Incorporates a number of parameters not tra- 
dlUonaUy considered In evaliiaUng the predictive effectiveness 
of 'trats. Thj previously mentioned selection ratio is one such 
parameter. Another Is the. cost of admlnHterlng the testing pro- 
gram. Thus a test of low validity would be more likely to be 
retained if it were short, Inexpensive, adapted for group admin- 
istration, and easy to give. An Individual test requiring a trained 
examiner and expensive equipment would need a.higher validity 
to-iuitlfy-lts retenUon. A further conslderaUon is whether the test 
measures an area of criterion-relevant behavior not covered b^ 
other available techniques. 

Another major aspect of decision theory pertains to the eval- 
uaUon af outcomes. The aBsence of adequate systems»for assign- 
' ing values to outcomes is one of the principal obstacles In the 
way of applying decision theory. It should be noted, however, 
that decision theory did not introduce the problem of values 
into the decision process, but merely made It explicit. Value sys- 
terns have always entered Into decisions, but they'were not here- 
, tofore^clearly reA>gnlzed or systematically handled. 

StUf another feature of decision theory Ir that it pftmlts a 
consideration of the Interaction of dUrerent variables. An^xample 
would be the interaction of applicant apUtudes with ^ternaUve . 
treatments, such as types of training pfogram^o whicHTndlvid. 
uals could be assigned. Such differential , treatment would further 
improve the, outcome of decisions based on test scores. Decision 
theory also focuses attention on the important fact that the effect- 
iveness of a test for selecUon, placement, classification, or any 
other purpose must be compared not with chance at |rlth perfect^ 
prediction, but with the effectiveness of odier avadabll predlttdrs. 
The question of the base rate is also relevant here (33). The 
examBla cttcd into ways m which the 

application of decision theory may eventually affect- the inter pre- 
taHoh of test val 
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' Mod^ralor/varlablM • ^ 

A promisin/^#edent development In the Interpretation" of test 
^^p«^^ variables (7, 19, 21, 

22^ 23, 24^^35). The validiQr of i given test may vary among 
iubgroups or individuals within a population. Essentially, the 
problem of moderator variables. Is that of predicting these differ- 
ancei in predl^blli^. In any bivariate distribution, some Individ- 
Uids &11 close to ^e regression line, others miss it' by appreciable 
distances. We may ^en ask Whethet^ there Is any characteristic 
in which ^ose falling fa^er from the regresston line, for whom 
prediction err ora bt a large, differ iysjt'em&tlis^y ahd consistently 
from {d^ose fa^mg cloae to it* Thus' a teit might be a better pre- 

^™^iH5f^W^cn^(off peiTo men than for womenj or for 

applicants from a lower thaA for applicants from a. higher socio- 
-economic level. In such examples,^ sex and socioeconomic le^el 
are die moderator variables since! they, modify the predictive 
validity of th^ test* 

Even when a te§t is equally valid for a,ll subgroups, the same 
score may have a different predictive meaning when obtained by 
^ members of different subgroups* For exajnple, if two students 
with different educational backgroundi,, obtain the same score 
oa the Sdiolastlc Aptitude Test, will they db- /equally well in col- 
lege? Or will flie one with the poorer or^lhfe one ^Ith the better 
background excel? Moderator variables may thus Wflue^ce/Gutoff 
* ^iscores, regression equation weights, or validity coefficients ot the 
same test for different subgroflps of a pppulStion. f 

Interests and motivation often function as moderator variables 
in in<jiyidual cases. If an applicant has little interest in a Job, 

^ he will prpbabl^^ do poorly regardless of his scor^ on relevant 
aptitude tests, Am^^ .$uch persons, the correlation .between apti- 
tudei tffri fcdres^nd Job peirformance would be low. :^pr, Individ- 
uaisC^o kfe/interesfed .and highly motivated, on tl& otlftr hand, 
the' cdh'datip?!/ between aptitude test score and job sudcrisi^niay 

.....he .quitejjh^tii jr^p^ Wb^er angle, .personality inventories^ like 
the MMPFiuay have higl^r^^ for some types of neurotics 

= ^thah for dt^ characteristic behavior of the wvo^ 
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] typfei ^ may mite ^^j ^re , careful arid^ aaE;^ipti. Ji]^ porting 

^ A be a test; seqrev; fn terms of 

whic^ ind^iduals may/ W sorted: into 'ftbgrmi^ have- 
been some promising attempto ^b; identify suct:mod€ratqr vm- ^ 
ablas in teal: scores (7/ 1 ^1^1, In a study.^f taxi diners 
conducts by Ghiselll (21 V Ae cqrrelatkm .b^^ an apUtu^e, 
test fni a criterion of Job performance was oi^ group v 

was^^n sorted Into thiids dtr the bMii*#?^orp^^^an occu- 
patianal interest inventory^ When the validi^ 
test Was re^mputed within the thlrdvwhose.6ccupatldnal ta^^ ^ 
level was most appropriate Ibr t% Job, lt.TQie;|6 .G^.^ch fli^ 

• -trigs iuggest that one test migHt. first 'be used l;q^ put jn^^. 

[ .viduals for whom the second test is likely to hay^ low validity^ ^ 
then fiom among the remaiittng caSes^ ij^se sdoring ^gji^oii : 
die stcond test are selected. , ;t j \ ^ i 

' V Even widiin - ^ sii^ test,; iUchila^ a pyei^onallfy invent^ 

irtay prova possible to- develop : ai moderator key terms ;c>f, ; 

^ which ihe* validity of the r^t of thfe test for each lndividutt^^mI^. 
^ be assessed (24). There is :^l|p evldeqce suggesting that ititra-- 
indtyidual variability from 6ne part of the Jest to another afFedts.' i 
the predictive validity of a . test for individuals (7)? Those iitdlfV- 
viduals for whom the test is more reliable (as indicated b^^ loW 
IntrailndividuaL variability) |ar4 also tht ^dividuals for ^^h^m 
Jt is more valid, as might be anticipated ^ ' . . 

A tephniqul devised to meet j^^specific practWal^niSed is synthetic 
validity (4/ 29, ,34). It is \4ll iknown t]ha.t tHeJ same test may 
haveliigh^ Validity (br predldfe^tlie peribrmanqe of office clerks 
or machinfats in:, one compfny; and low' br. nfegliglble vaUdity 
(or ipfts beifing d^ same tufe ii^odier'con^any/Similar vari- 
atipi fias been found in die corrStlbns of tests with achlevemem 
In ^ursei^i^f di£ same name Mven in different colleges. The 
of "colleiriU^^^^ is a notQri|^^i example of 
bo^^le^mgldklty and heterogcneitf Although tmMt^ially Inde 
fled average,' cHlege success c^^^^ally mean 
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many diflFerent things, from being elected president of the student 
council or captain of the football team to receiving Phi Beta 
Kappa in one's junior year. Individual colleges vary in the rela- 
tivfr-weighta they give to these different criteria of success^ ^ ' \ 
It is abundantly clear fhatr (l)|educational and vocation^ 

^criteria are complex; (2) the various criterion elements, or sub- 
criteria, for any given job, educational institution,^ course, etc. 
may have iittle relation to each other; and (3) different criterion 
situations bearing the same name often represent a diflerent 
combination - of sub-criteria. It is largely for the^^^reasohs that 

viest usws are gii^erally urged to conduct their ovki loc£il validaT ' 
tion studies. In many situations, however, this practice may^not 
be feasible for lack of time, faciHtifeSj or adequate samples. 
Under these circumstances, synthetic validity may pir^vide a 
satisfactory approximation of test validity against a?parti4ular- 
criterion. First proposed by Lawshe (29) for use in industry, 
synthetic validity has been employed chiefly witli Job criteria^ 
but it is equally appUcable to educational criteria. 

In sy nth etjfc, validity, each predictor is validated, not against 
a' composite criteR^j but against job elements icfentified through 
Job analysis, "^t^' validity of any test for a given Job is then 
computed synthetically from the weights of these elements in the 
Job and in the test, Thus if a test has high yalidity in predicting 
performance in delicate manipulative tasks, and if such ta^ks 
loom large in a particular Job, jhen the test will have high syn- 
thetic validity for that Job. A statistical technique known as the 
J-coefficient (for Job-coefiicient) has been developed by Primoir 
(34) for estimating the synthetic validity of a test. This technique 
offers a possible tool for generalizing validity data from one job 
or other criterion situation to anothej withoff actually condij^t^' 
ing a separate validation stiidy in each situation. The J-coeftici^nt 
may also prove useful in ordinary battery Construction as aii 
intervening step between the Job analysis and the assembling of, 
a ^rial battery pf tests. The preliminary selection of appropriate 
tests is hqw done largely on a subjective and unsystematic basis 
and might be improved througlfi the utilization of such a tech- 
nique as the J-cbeHicient* ' ^4 
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Rssponaa Slylbs • . 

The fifth and last development I should like' to bring to your 
attention pertains to response styles. Although research on res- 
ponse styles has centered' chiefly on personality Inventories, the 
concept can, be applied to any type of test. Interest in response 
styles was first stimulated by the identification of certain test- 
taking, atUtudes which might obscure or distort the traits that 
the test was, designed to measure. "Among the best known is the 
social desirability variable, extensively Investigated by Edwards 
(15, 16, 17, 18). This is simply the tendency to choose socially 
desirable responses on personality inventories. To what extent 
this variable should also reflect the tendency to choose common 
res|)onses is a matter on which diiferent investigators disagree. 
Other examples of response styles include acquiescence, or the 
tendency to answer "yes" rather than "no" regardless of Item 
content (3, 5^ 12, 20); evasiveness, or the tendency to choose 
question marks or other indifferent responses; and the tendency 
to utilize extreme response categories. 

We can recognize two .stages in research on response styles. 
First there was the recognition that stylistic components of item 
form exert a significant influence npon test responses. In fact, 
a growing accumulation of evidence indicated that die principal 
factors measured by many seLf-report inventories were stylistic 
rather than content factors. At this stage, such stylistic variance 
was regarded as error variance, which would reduce test valid- 
ity. Efforts were therefore made to rule out these stylistic factors 
through the reformulation of items, the development of special 
keys, or the application of correction formulas. 

More recently there has been at^ .increasliig realizaUon diat 
response styles may be worth meastiring in their own right. This 
point of view is clearly reflected in the reviews by Jackson tind 
Messlck (26) and by Wiggins.?(S8),, published within the past 
five years. Rather than being ;r|}g{ir^led as measurement errors 
to be eliminated, response styles are now being investigated as 
diagnostic indices of broad personality traits. The response 
style that an individual exhibits in taking a test may be asso- 
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elated ^ith characteristic behavior he displays in others non- 
t^t situatloni. Thus tha tendency to mark socially desirabie^ 
^answers may be related to conformity and stereotyped convention- 
ality. It has also been proposed that a moderate degree of this 
variable is associated with a mature^ individualized self concept, 
while higher degrees are associated with ^ intellectual and social 
Immaturity (31, 32). With reference to acquiescence, there is 
some ^ggestive^ evidence that the? predomffi^nt# "yeasayers" 
tend, to" have weak ego controls and to accept impulses without^ 
reservation, while the predominate "naysayers" tend to inhibit 
and suppress impulses and to reject emotional stimuli (12). 

The measurement of response styles may provide^ a means 
of capitalizing on what initially appeared to be the chief weak- 
n^ses of self report inventories. Several puzzling and disappoint- 
ing results obtained with personality inventories seem to make 
sense' when reeKHmined in the light of recent research with res- 
ponse styles. Much more research is needed, however, before 
the m^surement of response styles can be put to practical use. 
We need more information on the relationships among different 
response styles, sucli as social desirability and acquiescence, 
which^ are often confounded in existing scales. We also need to 
know mof6' about the Inter- relationships among different scales 
designed to measure the same response style. And above all, 
we need to know how these stylistic scales are related to external 
criterion data. 

The five developments cited in this paper represent ongoing 
activities. It is premature to evaluate the contribution any of 
them will ultimately make to the measurement or interpretation 
of test validity. At tliis stage, they* all bear watching and diey 
warrant further exploration. 
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Ten yeari have now passed since the National Board of Medical 
Examlnprs came to Educational Testing Service for advice and 
help in converting our time^honored essay tests to the more 
modem techn}^uei of multiple^choice testing. The cliange was ac- 
companied by the cries of those who chose not to understand 
multiple-choice testing and the criticism of those who, under- 
standing the tests, still did not like them. Nevertheless, with the 
assistance of those such as John Cowles, then'a member of ETS, 
convincing evidence soon accumulated to demonstrate the gains 
that had been achieved in the reliability and validity of our 
written examinations. (1, 2)^The new examinations prospered 
and after rdying heavily upon the experience, the fkicilities, and 
the excellence of ETS for a period of five years, we were bold 
enough to strike out on our own. We added to our staff highly 
qualified Individuals from the field of psychometrics and now 
after another five years, we have welcomed this opportunity to 
return to ETS at this Conference to describe and -not without 
some trepidation -to ask for your critical comments^ about a 
testing method developed by the National Board, 
^ I have chosen for the title of this^^ presentation, ''Programmed 
Testing." Let me make it clear, however, that I do not wish to 
become involved in prevalent debates over programmed teach- 
ing. Rather this title is intended to suggest tliat this new testing ■ 
method has certain features that are similar to those of pro^ 
grammed teaching. Whether one follows Skinner down the linear 
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path, or prefers die Branclkg program m^od of Crowderv the 
essenUal charastcrlstlc of programmed, teaching, with or without 
machines, appears to be a stepb^step progression toward care= 
fully constructed goals. Each step calls for specific knowledge. 
The student must already have the knowledge or he muM master - 
It before he may progress to the next step. Similarly, In the test- 
ing Tnethod that I wish to describe to you, the examinee pro= 
ce^s in a stcp-by-step fashion through a sequential unfo dmg 
of a series of problems. It is this feature that has, We believe, 
iustifled the terminology of our title;" Programmed Testing. " 

Since any test must be viewed in the light of the purpose for 
which it is designed, let me s'ummame ^lefly the objectives of 
National Board examinations. Our primary objective Is to 
determine the qualification of individual physicians for the prac- 
Uce of medicine. A physician, having successfully completed the 
extensive series of National Board examinations, may present 
his certificate to the licensing authority of the state In which he 
wishes to pracUce and obtaUv his license without turther»am= 
uiation ' If ' the physician has not elected to take National Board 
examinations, he must/go before the state medical ocammnig 
board, and if, later irf his medical career, he should move to 
anodier state, he may^ be required to repeat this performance 
perhaps years after he had thought to leave qualifying examma^ ■ 
Uons for -behind him. National Board certlllcatlon is a perman- 
ent record and, with few Jiftptions, permits physicians to move 
from one state to anothe(:%^out repeated examinations^ 

This was the initial pjatte of the National Board, but it .3 
not Its only function. '%|b^ihg the change to mu tlple^chx,lce 
testing, and as the relteblllty, validity, and ImpartlalUy of these 
exanJnations gained recog^qn; medical school faculties begar. 
t^'ee in these exartil^atlpn^*ieans of measuring their sUidents 
class by class a'nd' s^biect %y subject, and comparing the per- 
formance or, iheJr.'ltudenti,Iiiicpnslderable detail wUh the per- 
formancfe .df/l*her medlcilfcool classes across the country. 
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Our examinations are set up in thri| parts. Part I is a com- 
prehensive two'day examination in the basic sciences usually 
taken mt the time^pt a medical student is completing his second 
year bf medical schooL Part II is a two-day examination in 
the cUriical sciences, designed for the' student at the md of the 
fourth' and final year of the medical school curriculum. The third 
and final part- our Part Ill^is designed for those who have 
passed Parts I and ^'11, who have finished their formal medical 
school courses, have acquired the M. D. degree, and have had 
some Intern ©£perience. It is this Part III examination that is the 
subject of this presentation today. 

Whereas Parts 1 and II are looked upon as searching tests 
of knowlrige and the candidate's ability to apply ^s knowledge 
to the problem, in hand, the Part III examination is designed to^ 
measure those attributes of the welltrained physician that, rather / 
glibly, we call finical competence. It has been the long-standing/ 
conviction of the National Board that, before we certify an indi-/ 
vidual to a state licensing board as qualified for the practice" 
of medicine, we should— if we can — test his ability as a respon- 
sible physician. Can he obtain pertinent liiformation from a 
patient? Can he detect and properly interpret abnormal signs 
and symptoms? Is he then able to arrive at a reasonable diagno- 
sis? Does he show good judgment in the management of pailents? 

Historically, the National Board sought to^answer these ques- 
tions by means of a practical bedside type of oral examination 
based upon die candidates' examination of carefully selected 
patients. In earlier days with few candidates and few examiners, 
this procedure was effective. More recently, with thousands of 
candidates, thousands of patients, and thousands of examiners, 
we found ourselves running into a diffioulty that you will be 
quick to recognize. We were dealing with three variables: the' 
examinee, the patient, and the examiner. Here we had two vari- 
ables, the patient and the examiner, that we wfere unable to 
;^ptrol at the bedside in order to obtain a reliable measurement 
J^ffi^the ^aminee. . . 

[^pj^proximately four years ago, we felt compelled to face up 
|^;Uie necessity of developing a better test of ^clinical competence 
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or admitting defeat and abandoning the efFort. Therefore, we 
undertook a two-year project with the support of a research 
grant feom the Rockefeller Foundation and the cooperative help 
of the American Institute of Research. The, first step in this pro- 
ject was to obtain a realistic definition of the skills that are 
involved in clinical competence at the intern level, since It is this 
level of competence that our Part HI examination is intended to 
measure. The method used for this defmitlon was the critical 
Incident technique under the direct guidance of Dr. John Flanagan. 
By interviews and by mail questionnaires, senior physicians, 
junior physicians and hospital residents throughout the country 
were asked to record cUnlcal situations In which they had per- 
sonally. observed interns doing sdmething that impressed them, 
on the one hand, as an example of good clinical practice and, 
on the other hand, kn example of conspicuously poor chnlcal 
practice. A total of 3,300 such incidents were collected from 
approximately 600 physicians. This large body of information 
provided a rich collection of factual information that pnstituted 
a profile of the actual experience of interns. We had arrived 
at a well documented answer to the question of tvhat to test? 
The next step- and a formidable one- was to determine hotv to 
test the designated skills and behaviors of Interns. 

Many methods were explored. Motion pictures oi careiuUy 
selected patients were Introduced to eliminatQ the two variables 
that had vexed us in the traditional bedside performance. The 
patient pr6jected on the screen, became constant and die ex- 
aminer ^appeared in the form of pretested multiple-choice ques- 
Uons about the patient. This method has stood up well under 
the test of usage and continues as a part of the total examination. 

A second method- that which we have called Programmed 
testing -was evolved to "test the intern in a realistic clinical situa- 
Uon where he is called upon to face the unpredictable, dynamic 
challenge of the sick patient. In real hfe, the intern may be called 
to see a patient who, let us say, has just been admitted to the 
medical ward of the hospital. The intern sees the patient and 
studies the problem; he obtains information from the patient-, 
he performs a physical examfnatlon; and he must then decide 

page S2 ^ - 



John P. Hubbord 



upon a course of action » He orders certain laboratory studies 
and the results of these studies may tlien lead him to definitive 
treatment. The patient's condition may improve, or jperhaps 
: » worsen, or be unaffected by the treatment. The situation has 
changed; a new problem evolves; and ag^in decisions and 
actions, are called for in the light of new information and altered 
circumstances. ' ' 

In the testing method^ as we have developed it; a set of some 
four to six problems related to a given patient simulates this 
real-life situation in a sequential , programmed pattern. Tile 
problems are based upon actual medical records, and may 
follow the pat&t's progress for a period of several days> several 
weeks, or even months until eventually, as in real life, the patient 
improves and is discharged from- the hospital^ or possibly has 
died and ends Up on the autopsy table. At each step of the way, 
the examinee is required to make decisions; he immediately 
learns the results bf his decisions, and with additional informa- 
tion at hand, proceeds to the next problem. 

I believe at this point you are detecting certain similarities 
between the design of this test and the methods of programmed 
teaching— a step-by-step progression to the goal, each step 
accompanied by an increment of information upon which the 
next step depends. 4 

Essential to the methodology of this form of testing, as in the 
case of programmed teachings is the conceahng of additional 
information until fhe examinee has made his decision and has 
earned the right to have the additional information. We, there- 
fore, first turned to the tab test method. But the tab test, with 
the tearing off of bits of paper to reveal the underlying infor= 
mation, seemed to us difficuh to produce for mass testing and 
awkward for the examinee. 

We also gaVe serious consideration to the technique for the 
testing of diagnostic skills described by Rimoldi in a series of 
^ papers, (3) Again; a clinical situation is presented and the can= 
^ didat^ is offered a number of steps that .might be taken to arrive 
at the correct diagnosis. Each choice appears on a separate card 
on the ^ack of which is information pertinent to the selected 
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choice, in Rimoldi's hands, the scoriiig of this test depends not 
only upon the nature of the choices selected but also the order 
in which the choices are made. This test, although it has 'many 
injtar'esUng features, also appeared difficult to handle for mass 
t^idhg and furthermore does not altofiether meet our objectives 
for ^ thorough evaluation of diagnostic acumen and judgment 
f ih t^e management of patients. 

■''P^e then" devised the Resent method, the idea forv which har 
ri|ipgnizable origins in the tab tesf and the Rlnioldi test bvu^uses 
A .^iifferent technique, In^tfead of tearing off bits of paper or ^flip- 
ping over cards to firid'"'t'he appropriate information, we have 

^d0ncealed the information under an eras;^ble ink overlay. The 
bjcaminee first studies the problem and a carefully prepared list 
of po^ible courses of action. He theiv 'makes his decisions and 
turn$ to a 'separate answer booklet "Where he finds a series of 
inked blocks each numbered to cpWespond to the given choices, 
ffe removes the ink for selected choice^ with an ordinary pencil 
eraser and the results^ olhilMecisions are revealed,, 5 . • 

At first we ^;yaidi I considerable dilBculty in finding a printing 
technique that Wottld permit erasure of the overlying ink block 
without, at the samiii xitnc, erasing the underlying 'Information. 
With the help of an interested specialty printer^" a/method was 
developed that has the genius of simt^icity. Th^ answers -tM 
results of decisions/— are printed and numbered serially in an 
answer book. A thin acetate;layer is laminated on the pages of 
the answer book and on top of the cellophane layer, blocks of 
ink are applied to coyer the underlying printing. The ink is of 
a special formula sq that when dry it can be removed easily 
by an eraser or scraper. The acetate interphase layer protects 
the underlying printing, 

The method is readily adaptable, to mass testing and also has 
the advantage of being foolproof for scoring purposes.. The ex- 
aminee has no way of putting ink back over an answer If, when 
he sees the results of his decision, he finds that he has made 
a wrong choice or if mistaken choices become apparent as the 
solution to tlie problem unfolds, he is stuck with the choices he 
has made. . He cannot change his answers and he cannot cheat 
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by packing ahead under- tab! or on die ^ back of cards* His 
responses, whether rfght or w'rtongj are clearly apparent for the 
scorer to count, f 

^^I shall have more to say about scoring, but first a word 
about content "fhe complexities of the clinical situations con- 
tained in tHese tests are such as to make them very difficult to 

^.describe= especially, I might add, for a- nofl-^rtiedlcal audience. 
If, however, I may take a leaf from E'j^S testsi '^n dyer-simplified 
example may be tielpfuL I have frequently r^eeA vpn ETS tests 
a;i ' over-simplified example of a multiple<holce iteim^Chicago 
is (A) a state, (P) a city, (C) a country, (D) a continent, (E) a 
village. Just as ETS would,.! am sure, resent any implication 
that this gives a fair impression of the potential of multiple- 
choice testing, so too the over-simplified example we use for 
purposes of instruction to the examinee is not to be loojced upon 
as any indication of the difficulty and complfikity of the prob- 
lems in the, actua,l^test i^; ' 
' Figure Iy the ^f font cover of one of these tests, is shown to indl- 
c4te' .£hat 'carefuUy /w^or^ instructions are read aloud by the 
proqtor at .thie begitelng *df thi^ test^ The candidate is tbld. that 
thfe^l^t fe bakfcd HMpdn' his?/ judgment in the management of 
patients. He .is told that iriltial information- is given for eack' 
patient and that following the initial information a numbered 
list of possible courses of action constitutes the first problem for 
this patient. He is not told how many courses^ of action are con- 
sidered correct; his task is to select those courses of action that 
he Judges to. be important for the proper m^inagemeht of this 
patient at this point in time. After he has arrived at a decision 
on a course of action, hemustttirn to the separate answer book- 
let and erase the Ink rectangle numbered to cprrds|iQnd^= to his 
choice, and the result of his action will appear Whder thfi 'erasure. 
He^|is told that information will appear under the erasure for 
incorreQ, as well ds correct choices. If, fw exaqiple, he has 
ordereS? i^^diagnostic test, the result of the te^t^Wl appear under 
his eriisiire whether or not the selected te^t^^should have been 

.ordered. After having completed the first problem for the first 
patient he then goes on to the second problej^i for the same 
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te'^gure^ booklet. B^re treak- 

- A^^^Hr^"**"^^'"^'* ■ 'j'*" tii7i ■#^*ipirnii*ii)Y^MW"'»fs^g'i¥-di--a — viIhti^u iv^iBllllliaiuSC ■ 

>hlmsi^ wlA the^^ mathod ana to practice on two simplified prob^ ' ' 

filated io one patient. At the top of die page ^^^a brief 
d^erlptfota of a paUent who is -brought to dip eoafrgency room * 
. 'oT Ae hpspltal in a comatose condition. From die information . 

givata, any medical student would recognize the confe as due to' 
^t^iidi^tci^ recognize that Chi^go 

Is a c^). 1^ M^^obl^ fot this patient dieii offers nine . 
coursi of acUofl\\^har immediate decisioni^Three of these 

niria choicei cplistitute proper mknaglment at this p^nt: ser|fction 
of these Ar«i and only* these three thoices leads to a perfect^s^re ^ 
■ fot thli problem; andj for this sample problem, the key^ the 
'psfect score^ is Included on the ^age in order to give the caiididj^' 
-- at% ^om^ feeling of confldence In his understanding of the methoS.,^ 
Figure 3 is an enlargement of choices 3^ 4 and 5 in this list. 
Choice No. 4 is one of the essential procedures. It is shown here 
- with the erasure having been made and the. answer ^^aled. The 
examinee has decided to catheterize the patient to t^t thft uriiie 
and the urine is found to contain large, amounts of glucbie and 
acetone^' characteristic of the condition; with which he is dealing. 
Let us now,- assume that he did not recognize diabetes as the 
cause of the patient's coma and hedecided^to select choice No. 5 
and to perform a lumbar puncture. His. ^asure for this choice 
would reveal the words '"pressure , and cel^ftunt normal/' Thus, 
in a very realistic^fashion, we ^re simulating a situation with 
wKich an intern might be confronted m the mi^Ie of the night,// 
when the decisions are entirely his Own with no ttiar physician ' t s 
looking over his shoulder and sayijig "No, do not do a lun^;^,^ 
bar puncture/- He makes his decision, he p^brms the actidWP 
aifd obtains the results. He may oj'^ ihay not ^realize as the 
problem^ unfolds that the procedure' was unne^^ary or ill- 
advised. ^ . : 

Having made' his decisions for the immediate steps to be under- 
^Jtaken for this gatientj he then proceMs to the secon^ probleni. ' 
Figure 4 ihows an enlargement of ften%s 10, 11, and 12 appear- 

■ " I 

page 57 



FtGUR^E 2 , 

Smt of Test Booklet with i 




^ing in this second problem. Item 12 reads "Order Insulin," This 
p Sl correcrdecision* arising from the information dtat he should 
nave uncovered in the first problem; he erases the correspond- 
ing blpck and sees that the patient's condition improves as a 
l^esuh orhls action. But he is also given the opportunity to order 
other n^icBHion as fbrv example, in Item 10, digitalis. This i^. 
an, Uicotrect 'chulce that would reflect error in the first problei^; 
if hfe should order digitalis, he ^would find, as revealed under 
his elrasure, that digitalis is given in accordanee with his orders 
arid the patient's condition worsens. In the actual iiiternship sit- 
uatjon, it might Jipt be until the foflowing morning, when the' 
pa^nt is seen by the Chief of Service, that the intern learns of his 
error in ordering digitalis. Thg me^p^s of our Test Committee, 
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who are physicians prominent in their respective fields and with 
considCTable experience in this manner of testing, sometimes be- 

' ee*s artlons, particularly wiA regard to incorrect declsidhs. One 
^exstftiirier suggested that if the examinee selected a choice that 
would be considered a fatal error^ he should find under the era- 
sure "You have Just killed your patient; go on to the next 
patient." 

■ - "Ndvir,' to lurii to flie scoring of this test, Let me remind you, 
as stated earlier, that the basic function of National Board 
examinations is to serve as a qualification for the general prac- 
tice of medicine* After having passed^ the final test of clinical 
competence in. Part III, the candidate is cerMed .and we say in 
effect: We have examined this individual as carefully as we knqw . 
how and we consider him qualified to assume responsibility for 
the medical care of patients. Therefore, although we are interested 
in ejccellience, we are mainly concerned with tlie jdentiflcation 
of those few who cannot be considered safe to practice on their 
own; The focus of this examination is, therefore, on the lower 
end! of the distribution curve. 

After having carefully studied several different formulae for 
the scoring, we arrived at an error scoring to count bodi sins 
of omission and sins of commission. Each of the several hun- 
dred choices of courses of action offered in the test is classified, 
as to whether it definitely must be done for the well-being of the 
patient or whether it should definitely not be done and if done 
would be a serious error in Judgment that might be harmful to 
the patient. A third category includes choices of action that are 
relatively unimportant, procedures that might done or might 
not be done, depending upon locaL conditions and customs. A 
candidate who fails to select a choice considered mandatory or 
who selects a choice considered harmful receives an errbr score. 
The choices in the equivocal middle ground 'receive no score. 

Thus, we are dealing with a test and a scoring procedure that 
are quite different from the usual muUiple-choice method in which 

- the examinee is offered a number of choices and directed to 
sete the one best response. Here we ofEer him a number of 
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choifiis ^nd require hitn to use his best Judgment In selecting 
thpse that consid^s important for the management of the 
patient, ^iualiyi as in.:^a' practicSI sftuafion mi^^ 
hf recogn^i a oiimb^ of actions that sliould Hefinitely^'be done 
and ddier aySidi&s should definitely not be done. His res- 
ponits aris tfiieitfore, interaelated. If he is oa the right track, he 
makes a number of ; correct decisions among the available 
choices; then, by his erasures, he gaifls the information necessAry 
lof "B^^ of the patient in the n^t problem 

and the next set of choices. If he starts off on the wrong track 
In dUs, piograrnmed test, he may compourid his mistakes as he 
proceedi and he may become increasingly dismayed^s he learns 
from Ma erasures the error of his ways. But, if ne discovers * 
Uiat he is on the wrong ^ack, he has a chance to change his 
cour^, although he cannot uirfb the mistakes he has already,, 
made^ again a situation rath€F.trjae to life. ^ 

Finally, a brief summary of the statistical analysis of this 
testing prowdure. As you have probably already noted, both 

^the structure of the test and the manner in which we are now 
scoring it are such as to affect the reliability adver^ly. In our 

-. desire to simulate real-life situations, we have included within 
each problem a varying number of interrelated responses, Fur- 
tBermore, there is interdependency between one problem and the 
next. To return to the two sample problenis, anyone who knows 
anything about diabetic coma would decide to do the three pro- 
cfdures coded as correct for the' first problem and he would 
avoid othe^ procedures coded as incorrect. Then, having con- 
firmed the diagnosis in the first problem, he would have no 
doubt about the furtli/er management of the patient in the second 
problem. The interdependency of responses within each problem 
and from one probleni to the next has the effect of decreasing 
the number of points upon which the test score is builjt and, 
consequently, decreasing the reliability of the test. ^ 

We have studied at some length the balance between the ob- 
jCOive to simulate real-ltfe situations in this' sequential manner, 
oktesting and the objectivt to obtain high or reasonably high 
reliability. While the reliability of our tests of 'Part I and Part II 
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Is quite consiitendy between JO and JO, the internal cqniistency 
^e^Tt t^liiffW m for the first 

few ^ministtaaons has been to die range of .40 to .70 with a 
mean of J3, We are now taking i^eral steps that may be ex- 
pe^ to InCTease Ae rellabiUty. The i test hat b^n lengthened; 
the number of items for the next administraUon in January 

1964 has been increased from approximately 200 to approxi^ 
mately-4p0. The teMiinera, Ae experts who CQnatruct the tests, 
are le^lng from die item analyses die need for more discrlmV 
inatlng Judgment in categorizing each; choice as right, wrong, 
or equivocal The task is quite different and considerably more 
arduous dian the more familiar task of deciding on the oneM^t 
among five choices. On die other hand, the examiners find them- 
selves on somewhat more familiar ground and feel diat they 
are dealing with practical situations in a mpre realistic manner 
than when diey are faced with the necessity of a single best 

choice. . 

We have also looked closely at the correlation between this 
programmed test of clinical competence in Part III (taken after 
6 to 12 mondis of internship) and the multipl^cholce tests of 
knowledge of cUnical medicine in our Part II (taken before in- 
ternship). The correlation between this portion of Part III and 
the total Part II was .42 In 1962. and .35 In 1963. Corrected 
for attenuation, die correlaUon between these two tests In 1963- 
would have been .53. These correladons, positive and yet 
moderate, reflect about the degree of relationship we would 
expect between medical knowledge and additional elements of 
clinical competence diat are inevitably based upon medical 
knowledge. 

In conclusion, let me JulnmarlEe by saying that we have 
developed a testing method that promises to open ijp new dimen- 
sions in evaluating professional competence. We have described 
this technique; as "Programmed Testing" because of features that 
are similar in principle to. "Programmed Teaching,^' that is to 
iay a step^^step progression ^t carefully designed objectives, 
each st^* accompanied by an Increment of information essential 
to die sequenUal unfolding of the problems* The method is far 
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firom peifcct wad ^n€^s continuing rdlnement* It has, howevers 

our abili^ to evaluate ^<^vely certain skills and qualities of 
profi^ion^ - compdencet ikHls .and qualities that we consider 
esientlal for ^rtiBoatipn of a physician's readiness to^ assume 
independmt responsibility for the practice of medicine. 

1. CowIeSt*John T. and Hubbard, John P. "ValidUy and Rdiablllty of the 
New Qbjectiya T^ts: A Rtport from die National Board of Medical 
^minm,'' Jourml of Mwdical EducQHon, 29i30-34. June49S4. 

% HuUbarjL John P, and Cowlesj John T. "A ComparaUve Study of Siudent 
Perform ance in Medical Schools Using National Board Examinations." 
. ' ^ :-^pu$naipfM^i€Qi^^ 29:2M7, July 1954. 

t^ . ^;j Rlmoldi, H/ J. A.- -sRatiohale and Applicationa of the Test of Diagnostic 
: i'^Xi^:' iournedrnf^^ 38:364-368, May 1963. * 
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The cQncern of tte m^lcaj^fofe^^ and the more 

general problem of assie$srAen('i^^ '. \ 

A. in the evaluation :\ifia,^pltq^ht3 -^fe^ license to pimctice ' . 
medicine in a stateV^ .V' - r' . /^ , 

B. in the evaluatian'g^i^|f^^^^^ ^ ^ 

Gi in*the*evaluati©ri^G^tfi:e;quhi6 edfi'cation, ^ r 

■'^[^ The^^rlght .to pt^CT; piidlcipe\1§-m; cdntrolled by^ > w 

ticensiire in/each^of ;^he itajei^^^rid iterriiories. AltHoUgh require^: 
^ments yary Widely; 'ffibm si^te't^ j 
^\compltUoit of a sfjecife S 
^ to "som^, type of dG^tpMt^d0gre^ ;(i^.rt^^ cases_ the Mv ^; 
■/SBUisfecfery perfofiiianee. ^^^IJI^/ ; 

ate medical knpwifedge; ; artd vCbm Such .exaruJiiiftor^S; . 

\ oi-dfiiarily cpnsiitvbf^.a^^ a»d are CoJte^^tj^^y^^^^^ 

"called '^State B^^Hs/^^^hce^^t^^ are typj|i;Uy ^oJ^^^ 

i constructed ^nd' i^ not surprising tljit lBelr 

^nature varies (:?s^idfely Furthermore, beca^ .: 

of the" jeal6ii^::.ofV the right io^p^p^:: y--^ 

! » licenses to VpjaCtic^ 'W^ physician moving froin 

. .pnt^ptate .to:^atoi^ wishes to practice in two pid: ; ; ■ 

rj keen t states iimaUeinfeouiIyi fbd^ It necessary to submit to two 
.err ' rilHre uriiq^^^^^^ While a few states have 
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into reciprocal agreements, i.e. -agreed to recognize the 
^■■^ -^jflfi}^ U^nal^ exami natipn, such recigr QcaaoQ 

\^ is luttldently rare as to have fed to the devdopment of die pro. 
N^^tn of National Board ^aminat^ 

Jbe riMd for, and use of, tests in the eyaluation of applicants 
to medical schools is of more recent origin. Those of you who 
are famlliw with die history of die medical profession will 
. i!"*^™^^^^^^ recently, medicine, like law, was 

; r an iuptra^ oneself to an older and more 

experienced member of the profession, reading a few books, and 
Posing the state licensing examination. Ttare were a faw^ medical 
schools associated with certain of our' older universlt^s, but 
only j^ ^relaUvely small proportion of the practicing phTsicians 
H^M&^V^^ay ever attended them. During die fliirieteenth/centi^ry 
there wai a very rapid growth in die number ot. colleges and 
schools offering professipnar training in medicine, but a large 
proportion of ^m^t^^#e;^i^ with the result that their 

^^^^^ students fbr t|ie tuition 

which they would bring 4vith them diau for their aptitude for 
the study ofl medicine. This state of affairs is reflected by the i 
fact that in 1904 there were twicie as many medical schools in 
the United States as ther| are in 1964! As a result of the sur^ 
vey of medical education- sponsored by the Carnegie Founda- 
^ Uon, .which culmiijated in the famous Flexner report (1)/ two 
^ very si|niRc^ntrcyiflge5 o in pi^fessionaf training for 

medicine:' (tt J dii,^ndards/o^^^^^ education were markedly 

increased; and (bVthiS md4^^^^^^ more and more 

to seek an affiliatioo with a university. Today there are only 
a few non-afilliated schools remaining, ' / 

Aldiough medical practicioners had always been accorded/^ 
fairly high status in %eir communities, these developments served 
to enhance further the prestige associated with medicinfe as a 
profession. As a resuU, membership in the profession became 
the aspiration of ^many more younger people than could l?e ac- 
commpdated in recognized medical schools^ .Thus, die faculties - 
. of these institutions found themselves confronted with the neces^ 
sity of selecUng die most promising applicants for the study of 
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mai«le it IB not surprising. therdbreH that medical sch^jla 
r^^^ag ffi^ tot t/iTHl... 8 professional aptU^e «st. The 
, ^i^i^cSge Apumde Test vjloh w^^^evd^^ 
1927 was an outgrowA of thp Mbss Medical ApUtude Te^t and 
^^ofcaslon^ ^tltu^ Tast. (2) Since 1935 It has been^ 
every year In several 'hundred pre=medlcal schools to at least 
IHe/St ^ all appUc^S, for adnjission to medical schools 

-dilation of medSl schools with unlyersUles and. even 
„,ore import^t. tha Introduction of ^^^ f^^^ . 
haslc sciice teaching In the medical curriculum led to an increMv- 
on ^ part of medl^^^cKool faculties wl^_ t^ 
fvluS of student achievement' a'nd assigning grades There 
s tly as much disagreement among medical schoo professors 
as among teachers everywhere about the best methods of evab= 

a lng students and assigning grades. In fact, it 
I lese professional schools, wHere,,grades are so important 
L touting not only survlv^^to^ignment^o mp^^ 
!^d1her professional opportunities, thlt :thefc:v/.u cl> pdj 

fern^ent and discUkaion regarding the -"P^^^^^t^?^ ^ 
versus written tests'i of objective versus essay tests. .pt,t*SW e^ , 
pSing short-tii versus long-term learning whether grades , ,^ 
£ld ^present^prtgress made.as a result of taking a:c^^ . 
' ^^abLute4l of accomplishment upon^ourse co^^ 

of whether grades shouW be given prlpan^y for^lactua^ to^nl^g , . 
or for the demonstraUon of professional skills, and so on. ,. v ^ : 
WhiL the ratio of the number of applican^ t. th.^nttr,|p. 
.. of available places itf the admission class ha. declined 6ver^^^: 
Usf few yea^.^ medical schools are generally: more concerned 
^ the Selection If thdr students thah, W .univers.t«s ^ 
or ev;ek other protessiortal schp6ls,^his is true for two 
"ry good reasons: First, the high cost; of constructm^ equ.pp= 
ing; Md staffing medical schools resuW in very high per- 
cent societal investment as compared with other .pes of 
^ Stuo.al institutions. Second, because o^^e ltUegra nat^e 
of the curriculum in medicine, it is generally not f^^'ble for 
medical schools to° admit students in the second, third, and 
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fduix^ years. TTius, aiiy beginiiing itudent who does not succeed 
^ In-. At program l^VM^a and mstly hnl^ In an^^tqhlli^h- 

mtnt geared for a certain number of students. This, In turn, 
^wbjf^ % institution to public criticism for not turning oui as 
many dcNttors as it was tooled up to do. Thp typical college or 
unly^si^ maf' rtgret loss of an entering freshman but it 
carf always/ flU tts upp€i^.,9lass€s with transfer students and ^us 
.H^feS JS resourcea of the i^itutlon. 

j^^^yll ig neither Ae institution nor societ5r is^ pain- 
fiiUy, aware of the importarice of good selection of beginning 
itodents as are medical schools. 

,;Stlll another unique situation has contributed to the extensive' 
concern of medical schools with testing. To a degree that is not 
true of any other categoiy ot professional education, the Asso- 
ciation of American Medical Colleges monitors and coordinates 
the typical multiple applications of medical school candidai^f 
and provide^ feedback coiacerhing the ^ality of students enterli^v 

. 5?^ jaedical to sensitize the ^nUss^p^^^? 

i^ttimltt^s of medicalj ichpols to very wide difeenC^ iri^both^^ 
quantity and qualil^ of applicants to different sq^oolsf The ^ 
result is that medical ^hools in general probably invest .m;ore 
rim^^ and money In thie evaluation of applicants than ahy -other 
educatjonal institutlori. For example^ ipost medical schools have 
fairly large admissions committees whose members are respon- 
sible for interviewing all applicants to the school, (3, 4) and 
make ah eventual decision to admit or not admit an applicant 
only after an extended staff conference regarding each applicant 
Tjbese, then, are the factors that have combihed to devdop 
an increasing concern on the part' of the medical profession with 
the prpbleifts; of testing. ^ ' 

My personal involvement in the problem of selecting medical 
students began about a dozen years ago, just about the time 
thit Fiske and I completed our plroject on the selectiBn of gradu- 
ate students In clinical psychologyf (5) I was approached by 

jthe late Wayne Whittaker, assistant dean pf die University of 

s^ichigan Medical School and Chairman of the Admissions 
Gommittee, wlio asked me to work with him to improve the 
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selttiion of studertts for our medlgai school. Becaust 1 found 
that h" ^"^ cnmmltttt were deeply concerned witl^elecUng 
not ©nly students who would succeed academically^ but also, 
tib^ whq would be^me good pl^slelans wllUng to a«^oclal 
responsibUiUes dommtnsurate with society's investm«n-tftem, 
I was deUghted to 5%ept his Inv'ltaUon to particlpal 
laboraUve'' study,* With small research grant we cayn|c 
preUminMy study 6f the senjof ■ class of |952 
more Intensive study oF stuaents entering, ta,f^e^% 
the class that graduated In 1956. The fiiiamgHwWch 
Porting here, ar^ based, for tlie most part, on I'lZ ^f tfe IBl 
members of the ^ss ofc 1936. The fact that this grouMocs 
not represent the entire classi'was primarUy a function of class 
schedules rather than any biased •fielecUon of the sample. ' . 
. ; our broad objective was simply to try to Improve the overfall 
quality of the students seWwi to receive medical training. ;In , 
thfr- hope - of idmtlfying #rlables that should ,be considered at 
Ifthe tlme^of th^ selection rdeclslon, Ave made an intensive studj^; 
of the "mistakes" of the 'admissions committee. I.e.- those stu- 
dents who had been admitted but failed in the course of their 
■ training. Wc; eventually 'accumulated data regarding 100 poten., 
tlal pre^lctor-varlablp. f , , . , 

.For our criteria, we began, of course, with the most convenl^t 
and frequently used Iniex of academic performance, the grac^e 
point average. Medical? educators, like others, are free to admit 
'•'that academic g?tdes are not the only, an| perhaps not the mpst 
impo?tant, criterion of success in medical education. Neverth^ 
^ less, grades are regarded as important by both students and 
stafT land successful academic performance/especially in the-ftrst 
-two years (pre=cllni?al), is a sine qua non for developing one s 
cUnlcal skills In the' later years .of medical trjgrting. Therefore, 
as soon as the flrst^year grade point average had become avaiU 



^ '' l hnvc'riurposdy poslponcd .publitulloh of certain oftiie flndlrigs until' changM 
"' in stalT, currltulum. and Brading pracliws will nmkt 11 inipossibie to point 
anacrasingiinguriU ftny spucinudqKirtmcm ..... 
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abb for the class, correlational analyies were begun.* ^ 
^;^^wCT^^f€ € of tte 1 00 pr e dl etbr-Vartebli^^er^ 
W|^lj|l^ntly corrdated (p<.05)^widi first year OPA. Two 
vartables, AU Pre^med.vbradas and Pre-nied. Science Gradei, 
Ued ^r first place as the best predict^ ;^fthi^ 
yiddtog an ^ of +.61, TTils varlaBle wfi|jtalio signiflqgntly pre- 
dieted by the fcmr subs^res of the MCAT with, coefficients 
r anging from to JO. Mhand, this would seem to feflect a 
relatively satisfactory state of affairs. However, meiiibers of the 
admissions qgmmitt^ were not satisfied. EVen with vahdity co- 
dficients of ^Als magnitude con siderable error 'of prediction 
remains^ and, as noted above, die desire not to lose already 
admitted students is very strong. Ctf greater concern to mem- 
bers of thd admissions committjfcep however, was^the finding that 
Aeir ii^lvidual ratings of the applicants yielded validity coeffi- 
cient lower than that provided by a simple average of all 
pre-m^. grades. The actual cctfficlents of five members ranged 
fipm 427 to +J9 In spite of that the ratlrfgs were made 

b^ persons who had had an opportunity to study; the entire pre- 
mfed. nransc^t, review the. profile* of MCAT scores, read the 
letter of r^ommendationj and discuss each case at a staff confer- 
ence at which the taterview impressions of at least one of the 
;COfni||ittee members were reported. ' 

Qpite obviously, something was wrong. There were two possi- 
.bllltiej: (a) members of the admissions committee were not 
identifying and/or propefly weightlhg relevaiit items from the 
large mass of information available to them; Qr(b) the criterion 
of first-year grades was not the appropriate one against which 
to check 4|||validlty of their individual judgments. T am sure 
that you cai#guess which of these alternatives was chosen by 
members of the admissions committee. While they were, of 
course, concerned with evalu^ing the aptitude of the student to 



* The wrUer gratefully acknowledges Lhe assistance pf Gordon Bechtel, Lillian 
Kelly, and Leonard Uhr who served as research asslitants at various stages 
of the study, ,^ 
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• sudetsfiiliy complete die prescribed course work, of the medical 
HmfTlffiihim- (t,e.^,Ar gy^ thay were much more con^ 

oraned, 4^, toaW^^ 4ose appUcants who also 

^ hW ^ tithd^ charait^ to becomtag a gootf ■ physi^ • 

• ainu it be n^^ary to'secvA addUionrfl a|id 
very ^ffetWt crl**'^°° n**^^"'^ °^ success in malicihe before 

> the^unique vaildlUes bf the ratings dBriyed from this elaborate ; 

ladmlsafons-pioo^ , 
Duitag Ae next i^ree years much tihie and ejffoTt were devoted 
foj^the devdopmeht and acquisition of aUernative measures of ' 
aAleyement and perfbrmance ta^edlcine. Eventually data 
becamie available for 54 'fir ItFrla. " 
WiUi 100 ^tedictor Vafdables and 54 criterion' variables, sev- 
i eral alfenate modes of analyses suggested themselves ^d, ' , 
thknks to the availability of the high speed computer, several 
■ of diem were carried out. In this paper^ we shall be prlmaray 
' concerned with aq analysis of^the; resulting. 5400 correlations 
-. : betweeh the 100 predictors; bd iie 5^ criteria, More specifically, 
■ ' we 'sHall concerii ourseives with the, relative^ utility of the. pre- 
dictorsj I.e.-the number' of alffiriiate Griterla which they predict, 
' ' And' With the sig^^^ of each of the 54 criteria. I ■ 

: : have- tatenUbnally seWcted the term correlates of the criteria tather 
than predictots ' because of the obvious llmftations of the present 
study with respect to generalizabUlty. With such a large nnmber 
•• of variables and an N of only 112, -it would have been possible 
to have computed spuriously high multiple ■correlations to pr^ 
diet many of the criteria, correlations that would certainly have- 
shrunk markedly if the resulting regresilOn equations had been 
appll^ to another class. Furthermore, it must be remembered 
;that I im reporting data not only. for a single class but also for 
[but otie medical school. In view of the known dliferences not - 
' only b the Quantity and quahty of applicants but In selection 
prbcedurM -used by dlffirent sqhools, any, attehiiJtto make^en. 
;eral^^Uotts regarding predictive value of speclfio variaWes , for- , 
; '^er lastltuUoiii ■would be d&emely hazardous. In splte^tthesft,. 
Umltatlons, I beUeve th^ pg| :Bndings ^re wbrthy of serious 
attenUon, not because of t&elr immediate applicability soothe prpb- .i' 
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lem of selection, hut raUier because of their •implications fgT the 
prdblems of testing and measurement in all Wucational Thstitu- 
tions,. of which medical school is but one' example. 

Potential #radl^toF Variables 

Table' I lists the 100 potential predictor variables selected for 
analysis; also shown is the number of the 54 criteria with which 
each variable showed a correlation of at least .20, i.e. — a value 
yielding, a P of<.05. Since tl^re were 54 possible signiflcant 
correlattdns for each predict^f^ variable, chance alone would 
yield ^an expectancy of two tl three -such correlations for each 
predictor. . . | 

Part A of Table I lists 12 predictor variables which have been 
labelled Intellectual and Cognitive, This category includes pre= 
med. grades, MCAT scores, and ratings by the five individual 
members of the admissions committee, since these ratings appear 
to be primarily determined by the pre- medical ucademic record. 
Similarly, the month of acceptance is included in this category 
of variables jDCcause of the practice of according early admis- 
sion to the applicants rated most favorably by the admissions 
committee. In general, it will be noted that these intellectual and 
cognitive variables : yield signiflcant correlations with a fairly 
large proportion of the 54 criterion yariables. 

Part B of table I lists the 88,non-cogiiftive predictor variables 
used. The first subgroup of these includes 19 Irackgrouncl vari- 
ables derived from .analysis of the application blank or the 
transcript of pre-medical collegb training, Asjvill be noted, most 
of this group of variables predict more tlian a chance number 
of the 54 criteria. Note that the niyidjer of credij hours in bi- 
otbgy appears to 'be the best of these predictors, yielding signifi- 
cant correlations with 12 of the 54 criteria. There follows an 
extensive list of measures ,of personality characteristics and 
interests, ^ The first block includes the 16 scores derived from 
the Cattell 16 PersonaUty Questionnaire Test. (6) The labels 
given these factors correspond to the positive or high scoring 
end of each scale. Incidentally, these Cattell scores were based 
on an administration of the test to the class as seniors, whereas 
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L All PrpMed Grsdei * ^ 2i 

2* Rre^Med S^itnw Grades ' 2?^ 

3. Rating; Adm. Cemm. Member Ne. 1 26 

4. Rstlngi Adm, Csmm. Membe^^No. 2 30 

5. Rating; Adm. Comm. Member No. 3 23 

6. Rating; Adm. Csmm. Membtr No. 4 2^*" 

7. ifeting; Adm. Q^mm, Membtr Ho, 5 ' 9 
^*8. Month ^cceptea 17 ' 

9. MCAT; Verbgr fr^ I? 

10. MCAT; duantitative V; * 9 

1 1 . .^Mp^T; Modern Secitty 15 

12. ^MCATi Selenee ^ 14* 



NQn^CoHnlllye Variables 
Bqeicgroynd Variablgj 



K Year ef BirfFr: 

2. Own a Car ' 

3. Father's Occupation 

4. ,|ather'i iducation 

5. 'father's Iducation 

6. percent Sel^iupperting , 

7. ' Reporlid Ist, of Summer Earnings 
8.. Reportealsfj^l Religieus Activity 
w?. Am't ef Pre-Med 13 er 4 

10. Marital Statui 

Mofith Applie. SubmiMed. 
Hdighl 

Weight J 
,No. ef Credit Hrs. Englifh 
No. of Credit Hrs. Forejgn Language 
Ns. ef Credit Hri. Inet^anic Chem, 
No. ef Credit Hrs. Organic Chem, 
No. of Credit Hri. Physics 
No., ef Credit Hri. Bielegy 



11. 

12. 
13. 
14 

;ia. 

]6. 
17. 

5 8. 

19. 



CattelLPgriondli.tv Factor 
Questionngixg Variobles 



3 
I 2 













1. 


Cattell 


16 PF- 


Cyclethymie 


8 


2. 


Cattell 


16 P F; 


den. IntelL 


4 


3. 


Cattell 


16 P F: 


Ego Strength 


'4 


4. 


Cattell 


16 P=F: 


Dominance 


17 


5. 


Cattell 


IdP^F; 


Surgeney 


5 


6, 


CaMell 


16 P^F- 


Super-Ego 


1 


7. 


Cattell 


16 P^F- 


Advenluroys Cyclothymip 


12 


8. 


^ttell 


16 P^F; 


Emet. Sensitivity '^f 


5 


9. 


Cattell 


16 P F: 


Pararioid 


8 


10. 


Cattell 


1 16 P F; 


Hysterical Unconcern 


1 


1 1. 


Cattell U P^Fi 


Sophiiticatidn 


1 1 


12. 


Cattell 16 P=F: 


Anx. Insecurity , 


4 


13. 


Cattell 16 P^Fi 


Radieahsm 


10 


14. 


Cattell 


1 16 P^F: 


jndep. Self-sufficiency 


12 


IS. 


Caltel 


1 16 P'Fi 


Will Cent. SStdbility 


1 


16. 


Cattell 16 P=F; 


Nervous Teniien 


10 



1. 

2. 

. 4. 
5. 
6. 
7. 
8. 
9. 
*.\Q.. 

11. 

12. 

13. 
^M- 

15. 

16. 

17. 

IB. 

19. 

20. 

21. 

22. 
. 23. 

24. 

2-5. 

26, 

27. 

28. 

29. 

30. 

31. 

32 

33. 

34. 

35. 

36. 

37. 

38. 

39 
^ 40. 

41. 

42. 

43 

44. 

45. 

46. 

47 

48. 

49. 

50. 

51. 

52. 



Artist 

^syebslogist ' , 

Architect 

Phyiieian 

Osteopath ^ . 

Dentist 

Veteri narian 

Mathematician 

Physiciit 

ingineer ^ 

Chemist 

Production Manager > 

Former 

Avintar 

Carpenter 

Printer 

Math. Phys Sci. Teacher 
Ind. Aril Teacher 
Voc. Agric. Teacher 
Polieemajl 
Fores! Service Man. 
YmCA Physical Dir. 
Perionnel Dir. 
Public Admin. 
YMCA Secretaf^ < 
H, S. Sec. Sf^ Teacher 
City School Supt. , ^ 
Minister ' 
Musician 

C P A. * 

SenibH C P A 

Ae^untnnf 

Offfie Mgr 

Purchaiing'^gent 

Bo filler 

Mortician 

Pharmacist 

Sales Manager ; C= 

Raal Estatp Safe-smti 

Life Insurance Saleirnun 

Advertising Man 

Lawyer % 

Author Journalist 

Prei. Mfg. Concerfi 

Intbreit Maturity 

Occypational Level 

Maiculinity-Femininity 

"Anxiety ' 

"Theereticu! Valyei" 

"Economic Values" 

"Self =Conf idence " 

"Sociability" . 



■0^ 



No. of Criteria 

^ ^ ^ . 7 
9 
8 

5 



.^10 
^ 7 
7 
5 

!0 

2 

' 7 

6*? 

6 
12 
1 1 

S 
^^7 
10 

A 

4 

0 ^ 
1 

6 

1 

4 

2 

■:ulO 
'^f" A 
1 

2 
^2 
8^ ' 
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Also McQuitty Health Indisx 
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e 52^ Icores deri^d frgni the Stroiig Vocatiunul Interest 

laftk (7) and all other variables of Table I were based' on 

instri^ents administerla be^^ admission to medical school, 

i.e.-^^inder conditioiis whidV^ffl applicants to perceive them as 

a part ,of tffe to tdl^p ra ces s of ad m i ss s . ^ 

'The%rst 4Sg,variables derMd from die Strong VIB are the 
' - . . jSi ' - 

familiar vocatloiia'L interest scor^ which ^ere coded on the basis 

of ^numerical scowRjather than leyer grades. In general, high 

scbres Ireflect a pattern ^^Jnterests highly congruent with diose 

of persons^succeBsfully -tagged in eacli profession or occupation. 

Variables 48 to 52 ^.w|re also , derived from responses to the ^ 

StTOng ViB using empirid|lly derived scoring keys to assess 

personality va^bles ^alternaUvely measured by the Taylor 

Manifest Anxiety- Scale of the MPI, (8) the Allport-Vernpn Scale 

of Values^ an3' the Bernreitter Persoyality Inventory, f 9) Finally, 

the N'lp^uitty Health InSex (10) w^as based on r^ponses to a 

self-r^ort form deipeloped tij ^h'sess personality integration. ^ 

Although none of these non-cognitive variables correlated 

sigi'fificai>ify ^with a v^-ry llirgei nuniber of die predictors, it 

♦tjfclite noted that most of them yield more than a clmnce num- 

beW^f signilicam correlations. Thus, niitc or more of the 54 

c ri te ri a ,,w e re* f ou n d to ^ ? e s i g n i f i ca n 1 1 y co r r efa tccf . w 1 1 fi tl i e ('at tel 1 

f acto o f D d rn in a n cl% \Mlv en t u ro u s ( ] ) cl o t fi y n i ^i, S o j) h i sM^ a t i o n ^ 

fliadiCtili^i, rndependent r^t-Siifficiency. ni d Nervcnis TeusiuiV 

A fi^irlj^ impressive ninnbc^ ol signiWaut aorrelatiou^ also 

appe^ed for die , interest patterns of Psyclioh)^ist, Veteiiiiliiian, 

CHienilst; Printer^ ^ath-Pli^sicar'St^ence ruaeher,si^)liceuiau, CPA, 

Mortieian, and^Sal|f ^^liflage^. ]l^naHj% the* "Anxiety" score 

deriveifc frum die Strnug yielded .^line signllleant eonelatio.ns witii 

e'riterfon uiea^sui%s, ^ ^ ^ 

The Orlterla and thtftr OorreMes 



Tab^e II sunimtfftz^ the ^94 criteria. Used in diis study, S^wn 
' also 'for each criterion is tfc ntirnber uif die jKjiential cogifttive 
aiid^ non-cqgnitive pi^ictor variabief widi which it was signifi-- , 
^^aifflt|^ cferehued. Colunu^^ 3 indicates diq, cognitl^ variabje 
yielding the highest, and column 4 the non^ognitive, variable 



Tabid a. 



Th« eerrelates of S4 AltBpnativ© Crlt©rla 
of perftrmanc© In Medleln© 



Ne. ef Signi^iranf r'i wifh- 

Criterta ■ ■ laCsgni- SS Nen^ Highei! Correlated Varioblei 

tlvs Cognrfjye ^ ... 

Vsriflblts Voriablei Cognitive Non^CDgnihye 



A, Grsdei 



1. 


lif yr. OPA 




11 


2. 




12 




3. 








.4. 


4th-yr. GPA 


6 


5 


5. 


Over-si! GPA 


12 


5 


6. 


Pub. Health (2nd yr) 


5 


7 


7, 


Medicine i4tK yr) 


5 


1 


8. 


Surgery |4th yr) - 


5 


2 


9. 


bb. liGyn. (4fh yr) . 


_ 0 


4 


10. 


PtdiQtrks (4th yr) 


7 


6 


] )' 


Psychiatry (4ih yr). 


0 


J 2 


B. 
'f 


Nationally Administered 


1 Tests 




I. 


Contef Exam. 


U 


V8 



Pre-Med Grades .61 

Prg-Med Grfldet.^ .56 

Prg-Med Gradei .47 

Pre-Med Grades .31 



Pre-Med Gradei 
Ppe-Med Grades 
Month Accepted 

MCAT fQuant.} 
[Cattell *in|ell/' 

Pre-~^ed Gradei 
[MCATlVerbal) 



MCAT (Mod. Saci 



.57 
.45 
^.33 

.13] 

.':38 

.14] 



Veterinarian 

"Anxiety " 

"Anxiety" 

No. sf Hrs. Blelogy 

Dominance 

"Anxiety '' 

Pharmacist 

No. of Hri. Foreign 

Language 
Dominance 
Theor. Values^ 
Seif-Sulficlency 
No. of Hri. Biology 
Mather's Educ. 



"Anitiety 



.24 
.30 
.31 

= 22 
^.22 
"^.28 
•26 

.22 
-.36 
=.25 
-'.25 
=.31 

.28 



National Boards 



2. Ai^dicine> 

3. Surgery 

4. Ob. and Gyi^ 

5. Public Health^ 

6. Pediatrics 

7. Not. Bdi. Total! 



C. State Boi 



1. 
2. 
3. 
4. 
5. 
6. 
7. 
8. 
9. 
.10. 

14. 



if^dj 



Anatomy 
Hist, and^lmbry. 
physiology 
Chern. d Toxicology 
Bacterielegy 
pathology ^ 
Hygiene 
Pracfice 

Med. J|^fispratfence 
E. E. N.' T. 
Obstefi^lcs, 
Surgery 
Gynecology 
^^teria Medica 



7 


12^^ 


MCAT (Seieneel 


.37 


Insecurity £, 


12 


29 


MCAT (Science) 


.40 


^ Adventuroui 


Pre-Med Grades 


.40 


Cyclothymia 


10 


. 6 


Pre-Med Grades 


.39 


Pub. .Admin. 


10 


5 


MgAT (Science) 


.42 


Farmer 


10 


10 


MCAT (Science) 


.40 4 


Banker 


10 


25 


MCAT (Science) 


.48 


Printer 


8 


4 


Month Accepted 


=.38 


AcJ^ Cyclojhymla 


6 


8 


Member No. 4 


.30 


Lawyer \ 


0 


8 


MCAT (Verbal) 


.14 


Own Car \ 


4 . 


7 


MCAT (Verbal) 


^.21 


Physician / 


2 


30 


Member jsle. 5 


.29 


Sales Mgr.\ 


0 


8 


Cattell '^Intell." 


^.30 


Dominance 


7 


5 


Member No. '5 


.34 


yri. Qfg. Chem. 


0 


8 


M^T (Mod. Sec.) 


.17 


" Theor. Valuei 


0 


12 


Gen. Intell. 


= 26 


Occup. 


0 


0 - 


MCAT (Science) 


.13 


Occ. Level 


1 ^ 


1 


MCAT (Verbal) 


=.21 


Weight 


i 


3, 


Pre-Med. Sci. 


.24 


Surgency 


0- 


5 
1 


MCAT (Verbal) 


-.14 


J^Q. Hrs. PhVsics* 


0 


MCAT (Med. Soc.) 


.19 


. Further fduc. 



J14 

^.38 
.28 
.22 

-27 
.32 



=.24 
.28 
=,25 
-41 
=.38 
-.32 
-.27 
-.24 
.30 
.19 
-.2 
=.3 
.2 
. .2 
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N5< oi Slgnif icsnt r 'i wifhi 

Crlreria !2CognU 88 Nen- 

tlve Csgnlflve 
Variable^ Variables 



Hightif Csrrelated Variables , . 
Cegnltive ^ Hon^CoqnWhm 



D, SgefQrnetrie Chdkei _gs_Senfgrs 



1. 
t 
3. 
A. 
5. 
6. 
7. 
S. 
9. 
0. 
1. 
2. 

13. 



14. 



^mpjn| Companion - 
Office Partner ' 
Research Promiie 
intimste Friend ' 
PhySi to own Family 
Colieague Hoip, Staff 
Hasp. Teaching Sfaff 
Higheif Infame 
Pers. lofji. Qi P. 
Med. Seh. Teacher 
int. In Public Health 
Willing to Accept 

Salaried PosltlDn 
PlieaiQ (L e, Specl^ 

alty) QrientotiDn 
Hoip, Admi nistratar ' 



t. Internship Ratings 

1. Perianal Appearance 

2. Desire to Learn 

3. Overall Med, Knowledge 

4. Diagnoftic Campejence' 

5. Integrity 

6. Sensitivity to Patients' 

Needs 

7. AblL to Inspire Confidence 

8. Over-all Premise 



6 

J* 
no 

10 

9 
8 
4 
21 
20 
0 



16 



13 

7 



1 

17 



7 

12 



10 

2 
0 



MCAT jSciencej 
PreiMed Grades 
Pre^Med Grades' 
MCAT (Verbal) 
Pre-Med Scl, Or. 
Pre Med Gradei 
Pre=Med Sci. Gr. 
MCAT (Quant.) 
MCAT (Verbal) 
Pre^Med Grades 
Pre^Med Gradei 

MCAT (Quanh) 

Member Na. 2" 
Pre=Med Grades 



MCAT [Med. SocJ 
Pre=Med Sci. pr, 

Pre=Med Grades 
[Pre-Med Grades 

TmCAT (Quant.) 



[mCAT (Mod. SocJ 
[Pre-Med Grades 
[Pre-Med Gradei 



-.23 ^ 


Self-Sufficiency 


-.33 


.28 ^ 


Chemist . 


-.29 


A\ 


Dominance 


-•Z7 


-.12 


Mortician 


.31 


.39 


Ddmlna nee 


-.31 


.30 


Dornlnance 


'26 


.40 


Self-Sufficiency 


-.23 


^.21 


^ Sales Mgr. 


.37 


-.29 


Father 'i Edue, 


-.47 


.35 


NervQui Tenilon 


.23 


M 


Dominance 


-.32 


J4 


Height 


7.31 


.31 


Father 's iduc 


.47 


.32 


Rgdicolism 


-.23 



.22 
.32 

.23 
-20] 



III 

HI 



Am't Rel. Act. 
Math. Phys. Scl. 

Teacher 
Dominance 
Math. Phyi. Sci. 

Teacher 
Math. phys. Scl. 

Teacher 

Daminqnce 
Dominance 
Solei Mgr. 



.23 

.36 

-.32 

.32 

.25 

=.30 
-26 
.17 



yielding the highest correUitioiuuuth each of the criteria. 

Tliege 54-^ criteria have been grouped intu five categories^ A 
through E* Those m A require little further description tlian is 
provided by their luimes. The grade in the second -year course 
in Public Heal til was selected as the criterion because this course 
at the time was generally regurded by the students- as bodi difii- 
cult and not very relevant to their training. Tlie several fourth= 
year course grades p res uin ably reflect the faculty's best evaluation 
^ of the performance of the niedical stUcleiit in tlie clinical, as 
contrasted with tlie pre-cliniealj years of niedicine. As far as 
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could be determined, these grades were based not so much on 
tests as on the Impressions* made by the student as a partici- 
pant in ward rounds, conferences, and seminars. 

As will be noted, the best predictor of medical school grades 
throughout the four years is the Pre-med. Grades (Average). 
Of the non-cognitive correlates, th&"Anxiety" score derived from 
the Strong VIB, appears riiost frequently. Whereas there is a 
•relatively High'' Intercorrelatlon (about .80^ between the grade 
point averages for die first three years of n>edlcal school, the 
simatibn for the fourth-year course grades is quite different. 
Firsi-year and fourth-year grades correlate only .53 and .the 
' median intercorrelation among die six fourth-ycar grades is only 
.22. It Is therefore not surprising that a v*y dllTerent pattern 
of correlates emerges for these fourth-year course grade criteria. 
As will be noted in several Instances, a non-cognitive variable 
is more closely associated wuh grades in dicse ^courses than a 
cognitive variable, suggesting the degree to which these^grades 
are assigned on the basis of impressions made by the si"den^ 
while on a particular service and thus are more 'a functloi^ 
the student's personality characteristids than of his infellectyal 
performance. 

Category B of the criteria includes scores, made by the students 
on nationally administered objective tests. In general it yviH be 
noted that performance on these ob|ectlve tests at the end of 
medical training is predicted by most of the cognitive predictor 
variables, best by the MCAT Science score and Pre-med. Grades. 
If is of considerable interest, however, that grades on these 
objective ftsts are also significantly correlated with several non- 
cognitive predictor variables. For example, 29 of the 88 non- 
cognitive variables are significantly associated with National 
Board .scores in Surgery, the correlation with one of them being 
. almost as high as that o|aay cognitive variable. 

We now turn to a consideration of jhe criteria listed in Part C 
of Table II, marks on the State Board examination (State 
Boards), required for licensure dn Michigan. As contrasted with 
the National Boards, which afe^ dbysctlve'^eSkmlnations, diese 
a're typically essay examinatlonr prepared by experienced and 
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often older physicians who volunteer to prepare and grade ex ' 
aminations in each of the 14 iubject matter ar^as. Whereas the 
median intercpr relation among the scores on the J^ational Boards 
if %.51, the modal Intercorrelatioii «^mdng the sdofes on the 14 
" parts of the State Board examinatioft is zero^die mediilii is only 
,10 and 16 of die intercorrelatiQns are .negativel Thii being the 
case, Jt is not surprising to, find markedly difrerent patterns of 
correlates of the gradeg on. the various subparts of die State, 
Boards. For seven of the l^,' we note no significanily correlfited 
cognitive predictor variable" And in general for each ti npnTcog- 
nitive variable correlates about as highly as a cognitive Variable^ 
suggesting that even though these are wi1tte!U exaniiiialiUnlB, ; 
,marks are determined in part by tlie student's personaiity charac- 
teristics: interest, values, and background variabk*s. ' _ . 

Part D of Table II Usts 14 variables derived from Sociomotric 
Choices made by members of the class of 19.56 near the^complfr 
tion of their medici\l training. In our search for hiore relevant 
criteria (and, hopefully, for criteria, more prt^dictable f^oni thp 
ratings of the medical admissions ^ committee!) we decidSd to 
capitalize on the rather extensive opportunities which* niedicaj 
students have to become acquuinted "with each other's strengths,' 
weaknesses and special competences. In brief, these socicymetrlc 
ratings were collected as tollows: iiW meiiibers utthe senior cla^s 
were assembled in one rooni, provided' with a li?ff of 'cdl members.^ 
of their class, and before they knd\v'''what was to follow, they 
were asked to star the names of the 4(j fellow class, members 
whoni they lelt they knew best. Thpy wei'e*|iext asked to select ' 
the three most desirable (or most likely ) 4ind the three least 
desirable (or least 'likely) persons out of th|s gfoup of 4() fitting 
each of the categories indicated by theM^bcfi associateH With 
ihe^e 14 criteria (Cf Table 11^ part 1))^ The s^)riL:1br each stu- 
dent on each criterion was simply the algehiraic sum of the 
number of positive and negative chbices on each litem, 

It is of interest to note that pracjfleally all bf 'these sociSmetric 
criteria are significantly associated w^jth far. more tJian a chunce 
number of die potential predictor. vuriablc% In fact,'Vnost % them 
can be pre^dicted about^As ^v'ell as anj^ol tlie categories of c^^n|. 
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Furthermore, the pattern of the signiflcant correlates seems to 
make sei^e in that those spdometric criteria most obviously 
associated with intellectual pertorman are most Ukely to be 
correlated with ; cognUive variables, whereas those primarily re^ ^^ 
lated to social acceptability are more often correlated with "^^^^^ 
cognitive variables. -Finally, the pattern of the correlates mak^p 
sufflciendy good sense to suggest that these sociometricall>§|g 
derived criteria may have considerable validity ^^^1'^^,^ 
performance. 

Our final effort to secure additional and still more reley^^y^ 
crlteHa of performance as % physician is reflected in the Irtte^^^ 
sl^ip^ Ratings listed in Part E of Table IL With the assistanc^f/j 
a^number of members of the medical school staff,. thpe eig^^ 
VarlablesVwere selected as those believed to be.most relevanignc^^' 
the mbst Vatable by the supervisors of medical ^chool grkd ^ ^ ^ 
during dietr i^ar of internship= We note, immediately thsf^ 
crkerion measures tend to be less often significantly^ col^^ 
\^;^ill^the ^predictor variables^ than was th? case for socioii>e^ 
\^efla, ^nlj^'twb of them, rated "Dcsj3r|/b; I^^uh and '%&r- 
3aH:' l^ledlcaU-Knowledge'' have more tlrrfnia^ chari(^^prti 
^cq^^Iates among the. cognitive prediciirs: Although each of them^ 
' t^nds Ao be mbre^^lo^^associated with some nom^ognitlve pr€> *^^ 
cj^tnr iJw'v^^tli^a Slgnitive one,, the general magiiitucie^ the || 
correAkDiis len^s t(^ie low. Finally,^ we note that for the||p|^ 
^o&al, pftlie^i^ali^; "Ove^al^^^^ there are:;no^|^^^ 

tafe^feri-dtes %lmtsoev^r. Apparently this rating, 



ire- 
as 



m^iiy dterent Jupervisors in different internsMt>^^^ 



^'flecb^uch al^comppsite of unsya^ematicall^ we 
ko resulf in It not being siguificandy related to an^^%e 100 
'spredictO]t^ariables.i^ n ' . , 

J^'-'^Tn sumoiary, niQSt of the cri^ria were found tO: have a num^ 
&f coitelates among^jiredictor variables available beft^re 
adniission ^ rnedical scl&L^In ^f^^ most of ^riteria cou^ 
be reasonabiy wdl predicted by the' weight^^^ination of 



Vs^ni^^fubset;or^p^eSictor viirlables. Unfortunai^^wever, be- 
"'^^set^pf the relatiyely low intercQ^elationi among the alter- 
^^^i^e^crit ^et,of i^d»)r variabl|g|ould be 
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needed to select applicants likely to rank high on alternate criteria. 
We have already noted die extremely low intercorrelations among 
parts of the State Board examination. The problem of the validity 
of alternate criteria is even more dramatically demonstrated by 
examination of the intercorrelations of presumably alternative 
criteria of the same type of accomplishment, For example, the 
fourth-year course grade correlations with National Board scores 
are as follows i Pediatrics ,37, M^iclne ,33, Surgery .19, Ob^ 
stettics and Gynecology .12. And, as might be expected^ neither 
of these criteria measures correlates significanUy with State Board 
examinations bearing the same label! Under the circumstances, 
it is somewhat surprising that most of these » criteria are at all 
predictable from data obtained before admission to medical 
school, * 

The Ofltsrlon Oorralates of the 
Most Framlsing Fredlatpr Variabias 

From Table I, we noted that those variables listed in category 
A, Intellectual and Cognitive, are generally the most promising 
predictors of a large number of crUcrlon variables. In fact, all 
12 of them yield significant correlations with nine or more of 

■ tlie 54 criterion variables, most typically appearing as the best 
predictor of the more intellectually loaded criterion measures. 
The most promising Intellectual predictcir for this particular group 
of students was. the average of all prc-medical grades,^ rather, 
than the average of the pre- medical sclei^ grades only-, as niem- 
bers of the admissions committee had anticipated. In general, 

'the ratings of the members of the admissions committee, here 
categorized as intellectual or cognitive variables, showed signifi= 
cant lcorrelations with a fairly large number of criterion variables 

J^t most typically with those which might be categorized as re= 
ftdbting intellectual or academic accomplishment rather than those 
reflecting performance as a physician. Only rarely did these 
ratings of individual committee members turn out to be as pre= 
dictive of any criterion as one or more of the pieces of Informa- 
tion available to the person niaking the rating! , 
/The most likely explanation of this attenuated potential validity 
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of clinical judgments of academic pcrforniuncc appears to be a 
function, of the background variable B .19 ol Table I. The 
number of credit hours in biology is the only one ol tliesc back= 
ground variables associated with as nuiny as nine criterion variv 
ables. Since the medical school at that time required ah applicants 
to pK^sent 12 credits of biology, this variable represented the 
extent to which applicants presented credit hours in biology in 
excess of this' minimal requirenient. Many pre-nied, students^ 
especially if their over-all aeadeniic record is not good, arc en- 
couraged to take additional credits in biology as an indication 
of their strong interest in medicine and because they arc of the 
opinion that nicnibers of the admissions conHnittee 'would be ? 
favorably impressed with a transcript rcilccting elected courses 
in biology. This turns out to hixyu been the ease. In generah the ■ 
ratings of the members ol^ die admissions eouunittee tend to he 
positively correlated with the nund)er of hourn of biology.dlow- 
ever, this same variable, number ol hours in biology, yielded a 
significandy negative e()rrelaLion syith 12 ol iliu 54 crileria! These 
correlations were as follows: State Board Hygiene - M\ 2nd^r. g 
CPA -.27; Srd^yr; CPA -.24; 4th^yr. (;PA -,22; overfall CPA 
-.23; grade in Public Health, second year ^ JO; lourthryear grade 
in pecUatries -M\ National Cancer Kxani. -.23; National Boards 
of:Mcdieine -,23; National Boards IHiblie Heahh,-.2(); National 
Board Pediatrics -.22; National Board ( )vepalL - ,21 . In a word, 
the potential validity of the clinical precliciloi^ ol academic suc- 
cess Nvas seriously attenuated by the fact that the menibcrs ol the 
admissions connnittec were noting the number of hours ol bi- 
ology as a relevant predictor but weighting it positively rather 
dnin negatively! 

Of^ the non^ognitivc variables, tlic one yielding the largest 
luimber of signilicant correlatiouh with the criteria used was 
Cattails factor labelled Doniinance-AscendaiKc versus HubmiH- 
siveness. These correlations were as loUosvs: State Hoard Physi- 
l>logy -J3; State Board Pathobgy -.32; Oflke l^irtner -,24; 
Worthy Recipient of" Research (w ant = .27; Intinuite Friend - .2l\ 
Physician to own hunily -^31; Collcaguedlospital Htafl - ,26; IIos- 
pital Teaching Staff -.20: Personal Satisfaction as CP -,37; In= 
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terest ln Public Health -M; Hospital Administrator -J3; 4t%r, 
GPA ^.22; 4th-yis grade in Surgery -M; Desire to Ltarn - ,27; 
Overfall Medical Knowkdge -32; Sensitivity to Patients* Need^ 
- .30; AbUity to :Inspii© Confidence -^.26. Since a high score on 
this variable reflects a tendency to be self-assertive, boastful, con- 
ceited, aggressive and pugnacious, it appears that those staidents 
characterized by submissiveness, modesty, and complacency are 
mare likely to be pdsitively evaluated on these generally nou; 
cognitive criterion variables. Another personality variable nieas- 
ured by the Cattell 16 PR sh^ws a similar pattern of negative 
correlations with 12 of the criterion measures. It is Advenftirous 
Cyclothymia repreienting a continuum characterized^ ty advent- 
urous versus shy, timid; gregarious versus aloof;"and frcyik 
versus secretive. In gerieral, withdrawn ^clothyrnia seeml|^'%e 
more highly ^prized in this particular subculture, the only signi- 
ficant positive correlation , being ^29 with the, sociometric choice, 
Liklly to Make the Highest Income. ^ 

Four additional scores from the 1§ PF yielded significant cor- 
relations with 10 or more of the criterion Variables, These were: 
Sophistication (all positive correlations except with State Bqard 
Gynecology); Radicalism (generally negative correlations except 
with National Board. Over-all); Independent-Self-Sufficiency (gen- 
erally positive correlations with intellectually loaded criteria and 
negative ones with sociometric choices involving interpersonal 
relations) 5 and Nfervous Tension (generally positive correlated 
with intellectually loaded criteria). 

Of the 52 variables derived from the Strongs seven were found 
to be signiflcanUy correlated with nine or niore of the criteria. 
Psychologist scores are typiailly negatively^ correlated with socio- 
metric choices; Veterinarian scores arc positively correlated with 
ten criteria, includitig Physician to own Family; Chemist scores 
are negatively correlated with several sociometric choices but 
positively with Disease or Specialty Orientation, Willingness to 
Accept Salaried Jobj and National lioard scores; Policeman scores 
are positively correlated with 10 criteria, mosdy sociometric 
choirs and intern ratings, CPA scores typically yield negative 
correlations with criteria. Sales Manager scares are negatively 
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correlated with State Board in Bacteriology, Interest In Public 
Health, WlUmgness to 'Accept a Salaried Job, with National 
Boards in Surgery and- National Boards Overall; Sales Manager 
scores are positively associated with State Boards in Practice, 
Ukely to Make Highest Income, 4tli-yr. Grade in Psychiatry 
and SenslUvlty to Patient Needs as rated by tlie Intern Supervisor. 

Another Strong VIB score, Printer, proved to be a relatively 
good predictor of several different criteria with correlations as 
follows:, " , 

State Boards Bacteriology N # -^^ 

Sociometric Research Proml'fe . ■ 21 

Intcrest.in Public Health / -20 
Six Natiijnal Boards Scorcfs .^5 to .32 

i Intern, Over-all Medical Knowledge .29 
Inter n, Diagnostic Competence 

■ By contrast, Strong VIB/scores for Physician yielded but two 
^ significant correlations: ^.^4 with State Boards in Chemistry and 
Toxicology and -.21 .sociometric choice as Office Partner. 

The most likely explan^on of the lack of validity of this score 
is that this group of subjects, both as the result ol self^selection.. 
and the selective 'process of admlssiort, was so relatively hotno- • 
genous with respect to the interest pattern, measured by the 
Physician key that there Was lltde opportunity lor covariance to 

occur. 1 f \hn 

Finally the Anxiety score derived by scoring the Strong Vlb 
responses with a specially .developed key. yielded a consistent 
array of positive correlations^ with nine criterion variables ali 
heavily loaded with Intellectual and academic accomplishment, 
InteresUngly enough, this variable seems to be tapping some- 
thing dlfl-erent than the NerVous Tension hictor of Cattell lb It 
which was more likely to be positively correlated with. sociometric 
choices.* 



page 82 



i. Lowell Kelly 



Dlsoussion 

In view of the known uniqueness of many of the criterlipu* 
measures employed, it is encouraging to discover that so 
of the predictor variables showed significant and often meail^^ 
ful correlati6|is with so many criteria, Obviouslyj however, Aiiyi 
practical program of student selection would require some con- 
sensus on the part of a - faculty regarding tlie relative importance 
and hence the manner of weighting alternative criterion measures 
before .making a decision regarding predictor variables. Fartun- 
atelys factor analyses of our criterion variables indicate that 
fthere are probably not more than five or six really meaningful 
dimensions involved* ^ satisfactory measures of this limited set 
of criteria could be developedj it is highly probable that even 
better predictive devices^could be developed than thrfones used in 
this study, e.g.-pre-med. grades might well be weighted by the 
median SAT score* of freshmen admitted to each pre-nied, college; 
the parts of the MCAT could be designed to predict mor^ specie 
fic crit^aj empirically derived keys for the StTiong VIB might 
-well provide more useful scores than, die occupational keys now 
available, etc. 

Obviouslyj however, no test or test battery and no statistical 
technique can answer the fundamental question of what kind of 
a physician the faculty of a given medical schoof wishes to pro* 
duce. Given a niultidimensiohp.! criterion^ which appears to be 
the case, and relatively non<5verlapping sets of predictor vari- 
ables for eachj the "yes*n6" dt^ision required in student selection 
must in the long run depend on 4he hierarchical ranking and 
weighting of the criterion dimensions by the faculty concerned/ 

The findings of the study here reported strongly suggest duit 
wise decisions .regarding the product desired cannot be arrived 
at by any amount of stafl discussion of the problein in the ab- 
stratt. Only with the aid of an ongoing program which monitors 
the characteristics of appl^ants selected and rejected, arid of the 
criteria used to assess succ^^ in fboth the school and ill practice, 
and feeds back the interrehiUonships among these variables, will 
a staff be in the. position af knowing to wliat degree its stated 
objectives are being attained. Fortunately, with the ready av-t^il- 

; page 83 



' ... \ '-^ ^ . ^ ' 

1963 InvifQfi^al Conference on Testing Brofel^i 

- - ■ . ■ 1'' .. ^ ■ ' 

ability of modern computers, "an oiigoing program of = 

''quality control'' is now entirely'^iN^ lor any ni^dical sdidQl. 
Obviously, at least one appropriately tflined prdfaisionHl persM 
4s needed 'to ident% the . essential variables,;/to collect and 
analyze the ^ta in a systematic fashion^ and id interpret the 
results hack t^ie faculty members conccrued, ( 1 1 )v . # ^ 

Whiirthe findings of Ihis study of a single class in one nied-^ 
(cal school do not justify any rkonnnendations regarding $0dif' 
fic pVocedur^ to . be tised in the selection cjf medical studente, 
they do point to a number of mure general conclusions, .each 
wilk implication for measurement in all institutions invoh'ed in 
profcssidnal training: . ; " r ^ 

A. The criterion prohlemf is ,hodi inipurtanl and^complex. In=^ 
stead gf a neat uniclimensional criterion, it appears 4lkely that 
there are several relatively unril^ted^ criterion:, dimensions 
of success in prbfessional education and practice, Since.e^ch^ 

- of these dimensions, is likely to be regarded as inipp^tct^t 
by subgroups of die lacidty find by segments of the society 
which tlie professfon serves, it is essential that improved 
measures of these criieric)n dimensions be developed. 

B. It appears likely that rcasonahly vaUd predictions oi alter^ 
nate criteria of professional pcrlurmance,can be ijiade on the 
basis of data obtainable belorc admisslcMi to die prolessional 
schoob but a diflerent subset of predictor variables will ob= 
viously be required to predict uncorrelaft^d criteria, 

C: In view qj' the limited nundier o|J^fipplle^W)ts 

/education (an applicant to medilpf^iw 
than one fchance in two ()f l^ejj^|^?ttd by medical 
school within a couple of ycapiffi^OT the dlfler^ 

ential paUern of predictor varlabfes-correlaled with alternate 
criteria of perlbnnance, it is simply not Jeasiblevior any 
• school to attempt to select applicants who will rank high oi| 
all criterion dimensions, This suggesti? the pos^liblc desirabiUtj^ 

'of an explicit decisioii on the ^ari of the stafi of each pro- 
lessional school widi respect to the pariieular dimension(s) 
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of professional performanM which that school wishes to max- 
hnlze both in ita prograrti jaf studenf seleption and its program 
of initructiQn, Alternatively, larger professional schools may 
wis]} to consider the establishment of clearly differentiated 
programs df professionalf training tiAh the CQnseq| ^nt im- 
plications for usflftg different Variables in student selecti o n 
and exp^cuxig very different kinds of profpsional perform^ 
ance in the graduates of the alternative programs of education.. 
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Will you forgive me if I use diis^as an occaaiori to clear my 
own mind on several issues thTat relate to tlie growth of. intellect 
in human beings/ I have been engaged these last several jpars 

' in reiearch on this opaque topic, reading, expertoentingj puzz- ' 
ling over recalcitrant data, arguing with wiser men than I, even 
pulling .over the ancient problem of the parallelisms that%iay,^ 
exist between the emer^nce of the' species Homo and the growth 
of the child, ife urge to clear my tho^hts is not only generic, 
but radier Specific in this case, for 7ery sl\ortly we shall be 
taEing possession at the .Center for Cognitive Studies of a quite 
handsomely equipped mobile laboratory that will make it poss- 
ible for us to go out to 'where children are and test their Junc- 
tioning under standard ^condWgfc^ that, until now, have been 
hard to obtain. So f at times my conjectures may seem^ tortured 
or perhaps foolish, you will know that at least the motive is 
honorable ^nd practicaL , . 
' I should like to talk about several subjecu that are particu- 
larly b^eviUing^ the first of whlbh has to do with the nature 
of $hpught. I shall take it that by thinking we mean that an 
organism has freed itself fronWomination by the stimulus, that 
he is able to maintain an Invariant response in the face ofi.a 
changinj stimutas input or able to v^ry resp|^nse in the face 
of an invariant sUmuius enviropment. In short, we can con|eiVe 

liof something remaining the same in some essential respect, thSugh 
Us appearance changes drastically, and. also entertain^ diff&nt 
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hypotheses about it though Ita appearance stays the same, 

The means whereby an organism effects i^i^ frertom from 
§£tmulua contrbr Is through mediating processes, as they have 
come * to b^ called in recAt years. A mediating prpcess consists 
at very least of ibme repr^entation or model of the environment 

sentations such that the organism cai^ re^esent not only past 
arid present states^ but also states of ^ world that might exist. 
Or, to use another set of words, thinking involves coiMtructing 
a mddeN^f the world as we have expei^enced it and of having 
rules for spuftilng the model into positions from which we can^ 
tead !off predictions "prf things to come or things that might be. 
If all of this apparatus Is to have any Ainctlonal significance 
for' the organism, diere must first of all be some corres- 

pondence between the model one constructs In one's mind (to use 
the old-fashioned term) and the world in which one must oper- 
ate. And moreover,, if the rules for spinning or transforming the^ 
model are to have any predictive or extrapolative value, U^ey 
must also have some bearing upon the processes fliat go on in , 
ifature. What assures functional utility'^ of this kind is, of course, 
feedback and aorredion that occur when we attempt to use 
thinkirfg :ia deal^ with some domain of experience or potential 
experience,/ ■ 

Th^re are various questions that inmiediately pose theniselvesit 
giyen tliis conceptioUj and on closer inspection they turn out to 
be questtoila not only about the operatioii of* thought l®t also 
about the nature of intellectual growth. Let me set these questions 
out, and then.we can turn our attention to tliem seriously. 

The first has to do .with the nature qf representation. How do 
we in fact represent the world? In this case I shall defin#-the 
world ;simply as the r^curren^* regularities In man's experience 
and align myself with Ernst Maeh (1?14) in order to avoid any 
metaphysical fidgeting. The questipn of how to represent things 
turns out not only to be a problem for the psychologist intep^ 
ested in thought or memferyrbut also a problem for the com- 
puter simulator who faces the issue of how to organise storage 
and retrieval of information. For us the question becomes one 
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about the develo^ent/ of repreieffatiohs, Th^^ with* 
growth. How? ^ { . « . ^ / 

Secondly, what accounts fox the scope* and ^conhectednes^f 
a particular tipresentation? Some representations take In great* : 
generic chunks of the world, and permit rtfady r^^nition of the 
relallBhs^tet^w^ 

and tlme-^ound, almost agsurlng there will be very little transfer 
of knowli^ge and skill from one situation to*an other. This trans- 
ferability . Increases enormously wtdi growth. |iow dqes it come 
about^ ^ ■ ; 

> Tli^irfllyj how do we' operate upon our models or fepresenta- 
^^,the worM in order to predictor extfapokte or othenvlse 
fe^ond the inlbrmation given? Are these operations like the 
j^of logic Of of language or what? Surely thdy are not the 
for all ages, for all conditions^ that is plain - or else there 
w^cf^ be no disagreements. If we have learned .anything froni ' 
PiS|el, (1950); at air it is certainly that the -Uogk*' of the child , 
of three is not that of the child of six. BlU.you may well knit 
your brows o^^^ny^ use of the word logic, with or without ' 
quotalioi^ma^Ss. For surely the operations of thought are flot, 
really 'logic.^ ^ ' 

And nnally, what isithc^le dl the gpnrfticp^ 
of nian's capacity to .^^stryct 'and use pWels of the world? 
No matter how replete Amto^e's genetic oode nughtjmve been, 
it di^ not contain, inforniation that made hi^ able io deal with 
jHadratic furfctions. Far less gifted mathematics 'cbnGeiitmtor|t in 

Ward Collie today do it,^muchjnare reridily wJth 1%I ^ 
lode to go^d^So ^rh^ps it would lie better Uo dsk the question 
In the r^jse^rection. To what extent does the wdrkihg gut of 
' feSnTg^Aic CQd^hat partWf it having to'^do with liitellec^al 
capacit y dg^ ^nd/mpoti gin&ifliiments, appliances, forniuiae, and 
V other intellectual prt^thetic devices? The issue in its baresHbrm 
is radier starding upon reflection, for its resolution gtjvA^ns how 
we conceive of -instruction, curricula, and the other meanj^jwhere*^ 
by we equip human beings to grow. ^ 
'^i^-|,Hbw io. human beings construct models^ of their world ai^ 
v. &pW do* these change with growth? Secoifd, how do these{^odels * 
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becoin'e aufiftciiody' general ao that they fit a wide yariety of 
Situations we encounter? Thirds how do wej^ lise niodell to go 
beyi^l^ the information given under our itoses'? An^ finally, 
what has all this to^p with inheritance? ; r ' . 

^^^^ ^hat is nieant to j^r^^ 

What jdoea it mean to tonslate experience into a modfe j|f the 
world? Let me surest that diere. are probably thre^^^^Jn 
jvhlch human beings accomplish this feat. The /rfirst is tKrough 
action. We know many things for which* we have no imagery 
and no words and they are very hard to teach to anybody by 
the UfC of either words or diagrams and pictures, if you h^e 
tri^ to coach someBody at tennis or akiirfg oj teach a child to 
ride a bike, you will have been struck at the'.wordlessness an* 
the diagrammatic inipotence of the teaching process. (I heard 
a sailing Instructor a few years ago involyed:wlth two chrpFen 
in a shouting match about ''getting'the luff-Qu^tof the nVaiR"; 
the children understood every single wprd, but ilie sentence made 
/ no contact with their muscles. It was a shocking pferfornumce, 
like much that goes on in schooL) There Is a secoiid systeni 
of representation that depends ^pon visu£\l otRer sensory 
organization and upon the use of sununari^ng Iniages. We ni^y^ 
as in an experiment by Mahdler ( 1962), grope ouy way^thj^jLigh 
a maze of toggle switches, and then at a certain point ^^iwer- 
learning, come to re^gniz# a visualizabl| path or pattern, ^n 
Cambrklge, We have come to tallr.about the first lorm of repre- 
sentation as enacitm, the ieconcl as fApH/c. %onic represgitatign- 
is principally governed by principles of perseptuaf organizatioif 
and by the economioul transfornuiti^ns in percej^ual organiza^ 
,:iion that Attneave ( 195^) has described -^*clmiqi^s for filling 
in, completing, extrapolatiifg, Enactive representation is based, 
^ i.t/ ' aee^ ^ upon a leading of fesptnses iigbd forms ^f habituation. ' 
^^Inffly, thdt Ts represfeftation fn words or language. Its liall- 
m'ark is that ft is symbolic in iiaturc with the design features of 
symbohc systems-^^hat ^re only now coming to be understood. 
Symb0ls^^ words'^ are arbitrary*(as ffockeU [1959] puts it, there 
is no relation„ betweep*.the syinboi and the thing so that a:»/iri/t' 
can stand fc^ a very big creature and micfoQfganism for a 
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smaU one), diey are remote in riefere 
highly producUve or generative ln|tie sense that a language or 
any symbol system ha^rules .fbr tlie fbrmatlpn, and transfbrma. 
I^n of sentences Aat cin turn reality over on" Its beanv ei^^ 
yond what Is possible through actions <^r images. A Ian -• 
for example, ptrmlts us to intr^dilprlawfiiHyntactJcrtraif 
tlons that .make :it easy afld useful to approach detfaratl^. 
poiltlons about redlty In a ^nost strlklhg way. We obsei , 
event and ^encbde'lt-the dog bit ttte man. Ftom this utterano 
we can traVel to a range of possible tecodlngs-did the dog bit 
the man or did he riot? If he did not, what would -have, happenf 
i etc., Grammar also j^rnilts us an^o^erl^.^a^ oM^ti 
hypothetirail^roposltlons diat may hay^*""— -^-^ - 
reality*"The unicorn is In the garden: 

a mystery"; "In the beginning was the won 
i should also meritipn ohe «ther imop 

tem-lts coji^pactabillty-a property Sat 

of the order F-MA dr 5=1/2 gt» or "G 

grows the„gblden tree p( llfei" In each ca 

quite ordihltty, though the semantic sque. 

colleague fiWrge Maier (1956) has propo 

7+2 as ,tfie! range of huriian attention |»] 

We are iriidft'3: limited In our sp^ Letnte t.-.,, ,^ 
' compactli^ ^ or : gjndenslng Is^k^ means w^ereb 

seven slots with gold rather dm" /^ri 
' Now what is abidingly Inte 

tual development is that it see 

systems of representatloift But 

The young Infant appears- 'I 



I' aljout'the naW^^Spf in^C- . ' 
ruri nhe cour5y oP3lpe.Uiree 
say.this moire &reJi||lw. 
tte by a.' pfoces^. th^tflft 

notably restricted to action-apd#^ 
the nature of oblects ^^outaid^'J^^Mt^^ 
viewing research, but I Wo^^fW^b*^^ 
some ' orH. To ^ #| 

mem by,die^chil^i« ^96S|^as ^n^)ten9lW 

' Mje obje upon action, .ma vary 

i^-pld; h^^old^of a desifable^ppt and 




arid Hld not exi 
lem by,die^e 
of how the ide.n 
young child J 9 
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^ rtie .^py|ctA|^^^!4^^^^^ reiult will be screams. 1{ th^bject' 

i:^' is tkmoUA^^ the child wlU not mind/ A liSe lafe^' ^ 

movem^^t jpf ^Me hand toward thtf I object sufl^ces to Idf htify A 

j^. and if IX U/tith he is raadiing, streams ^ protest' / 

fi, results Eln^ly^^ protest if the objett is remfeV ^ js^.^..^* 

; ii has been fixa and sb bn^Wharthe child r " 

: in that tot "year dr 18 mondis is lome way of giving ffiS^bject ' ^ ^ 
ii^ having Jt musciiftLrly In |iaftd .or*of . ^ 

V ^ejfltto|| 40 ' it musdilarly. It is a limited fworld, the ^orld of 

; : ^. ' - / ^ ^ 

^ r ^ ;^Vltet' appeal next in' development is a great achieveme^. ^ 
^5 //Ima^s ;!i^ve^ aii autondmbas status, ■great sunA^aWzA .^f ^ ^ 

: J p^cttpij^ i^y a|^ bedpnrie a paragon: qf seq^ory - 

/dis^rt|blUty. il^;^^ of thelaws' of vividiipB and Sbs ^ctioit^ 

. V pf tteipn • is a^^ with tliis bri|ht thing winch is * 

, -^BJirh^ splen^d one, wffic^n 1^J# 

■ gjvw ta theuext And so it goes, Visual memw]^ ^ 

^^at-'^ir'atage;^^^ toi^r^e and speciflc.'*What is 

i *y ^^i^ipg ^aihb^ is that the^ child is^ crea^^^e V X 

^-jCQttrdlljSd af; the situation. The chll^ t9Li^4;A' 

7 -reprodjiice/ iir the form; th&t .\^s*^ 
, v there: cap:*i^eprodtfceva pattern of nine glas^siaid 

. f^^ytvirt ;i^wg'.^^ varying sys^ ..^^ 

;^ #mati^lly;:;iFig^^ Jn^j^,; he does^ it^a:s .well as &thh-i^ ' > 
V yegr-6l.d; Just change tji^ positlo n ■ onfe glass in tiWftiratrfi^: / 
■ ^ Jthat he hto to. f^rpdUce and he is;l6sL Hemn copy fr^^iifiag^^ .1 ^ 

;but he ea \^ ^ 

V'year-bld:, ^rt the other : 
■ ^£h^: dlfferinc© seems td bie a matter 'of being able to traiislaj^- ^ * 
: the . visual experjence ihtb a form that cy be dperated \ipo^ 
^ and here /if ^ whfcre language^ such a superb instrur^t M ^ . - 

thought. Fo^ pnce the child is able to insQ-uct himself in tj^askT ^ 
' ' by saying to "himself that in one direction the glasses get ftitter ^ 
^ and'rtf the other they get taller^ he mn change the positidrii^^ ^ 

/^^^^ quite easily and without regard to orientation. ^ : , ^ ^ ! 

vThe child, of course, has language in nearly its full grandeur 
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by tlie Urae he ia flve--hk^ sense of using it in communi'^ 

cation. But this l| not the same as using it as an instrument of 
Uiougbt. I do riot know quite how to say what it means i^have 
language and ase it, as ati instruynent of thought as i^mpared 
to having it and no^t using Jt in AiiB way. But I ratli^puspect 
it has somediiing to do with a process whereby the child, to us 
Sir Frederic Bartlett's (1950) old ghrase, turns around on him- 
; self a^ reformulates what he does in a new forp. Recall the 
subjects^ in tlje fpggle=switch maze reformulating their way tlirough 
the maze into # slmulUinizihg image rather than representing it 
only by a successive series of gropings with minimum visual 
suppoft. So too with language we seem to turn around on ex- 
periencci reformulate, and condense it into language. Then we 
can use tbe transformative process that language makes possible. 

Let me turn now to the issue of tlie scope of our models and 
their generic or transferable properties. How does ithe phild learn 
, to group experience into lodger chunks so that it takes in longer 
periods of time and permits one to escape from immedlac^? We 
find an interesting answer to this in our studies of the growth 
of inference* One of the principal features of growing up intel- 
lectually is being able to deal with Indirecit information, Let me 
illustrate. The young child has little success at the Twenty (Ques- 
tions game (at, say, age five) because he requires direct infornia- 
tion, information that is self-sufficient. Why did a car bump into 
' a tree? The five-year- old is full of direct and immediate tests, 
of this or that hypothesis. A constraining, indirect strategy is 
beyond him- "Was it night?'' ''Yes.'' ''Was anytliing wrong with 
the car?" etc. * « . - 

Around sevens he conies to master the use of such strategies, ^ 
It is interesting that at Just about this time tlie child is' also going 
through two parallel developments. On the one hand^he is learn- 
ing to create rules of equivalence that Join together a ^et of 
objects by what logicians speak of as a superordimite rul^jhit 
things mliy be considered alike because ^aH of them ex^Rjit a '0 
common characteristic. Befpre that, equivalence , is not the true *J 
equivalence of die adult, ^Banana, peach, potato, milk are ev^t^i- ^ 
ally all alike because they all for eating, etc.. But before-^^t, 
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• banana and peach Were .*like because they, v/ere both yell oW, 
peach and potato both have* fkins^ peath and 
I had for lunch yeatar^ay. This latter equivalence rule Is what 
Vygotsky (1962): years ago called coniplexive thinking, and we ^ 
hqvf rfpTodufipd h^ flndings and h&ve been able to \^te rules 
for such groupings, They are fantastically complicated ruleS In 
the sense that if you gave diem to a coniputer, following them 
would demand very considerable memory and processing capa- 
city. AU such rules deal with local likeness in appearance^ 
chains* keyrings, and so on. The passage to subordinate group- 
ing grovi4ea\.a kind of freedom from the imme^aqy ol local 0^ 
iimilarlties. the other parallel developmelu is the growth of the " . 
\ 'distlncti^in the child's thought between appearanct and reality. . \ 

^ ' Pour^water into a standard glass. Then pour it from ther<K into^i^ 
a slimmer, taller one. The child of five will say that the second 
glass had more water hd^^m it is taller. The child reckons by \ 
appearance, At seven, tfe picture changes. The child will say 
■ that it is the same amount of water to drink really, but it looks 
bigger, ^ 

Superordinate equivalence, appearancerreallty distinction, and 
capacity to deal widi indirect information- all within a year or ^ 
so We Rftd, moreover, that the child can be aided tj achieve 
' this new simplicity by techniques that activate use of language 
before he^*^tbunters visually the real objects he is to deal with. 
We get hifti vtp talk about how things will N while the olyects 
are hidden behind a screen and then expose him to theni. The 
results are striking. The new system of representation by lang- 
uage seem^ to be able to compete under these conditions with 
the laws of image representiition which contain no such distinc- 
tions of equivalence or of indirect inTorniation. 

The growing scope of human ''models" depends probably upon 
the opportunity for recoding experience into a language system 
that contahis distinctions like tliose we have been discussing. How 
the language "gets into the head from the mouth,'' to quote a 
student of mine, is baffling in Its details? But it may well dfepend 
upon some sort of law of intervening opportunity. >Iere is where 
the issufe of assisted versus \unassisted growth becomes centraL 

page 94 v 9 ^ 



J«rome S. Bruner 



Language and the Opportunity to use language in a fashion that 
is (in Dewey's lovely phrase ) *' a way of organizing thoughts 
about thUigSs'^ pfoba^Ji^ devefop in interaction' with an'informal 
ttitor^a parept or ftonie adult meinber of the linguistic cbmmun- 
dly^who^rt^pondsftioritingently to the^^chUd^a iegpyonses by ^erods^ 
ing or denaandliig a* recoding. My colleague Roger - Brown (in 
pfesi) he^s shown, the man^r in which, 'in learning" the syntactir 
cal strUttUre of aMariguage, die child first uses highly telegraphic 
utterances ("'mummy coffee") which the parent then expands and 
idaali^ea to provide the child with a modil ('*Yes, mummy is 
having aome coffie«''). .There are, very likely, games of diis order , 
that we quite unwittingly play with the child* and we know pre- 
cious littk about them. . 

But there aqi probably other things than langiuage and sym- 
bolism that operate here. I have tried in vain to find soniethmg 
in the literatjire that is reliable on how a child scans his envir- 
oninent, whether he has . uneconomical techniques for getting 
information. We have had tp start experiments on our own. 
We have tried to find something about the child's immediate 
memory span or attention span. How. many things can^he hold 
In DiinA. at once or J to use currei^ J^rgon^ what is his '^channel 
capacity*'? Again the literature is moot, so we shall put our mo^le 
lab to 'Work, But each of these things may be crttically imporitot. 
If the child's information search in the visual field is information- 
ally ineBicient (as we suspect it^ is), he will Dverload hiniself with 
too much material to cope wiili. If he cannot deal simultaneously 
with several alternatives, then again he cannot deal with equival- 
ence problems whifh require thiit one carry over a criterion of 
grouping through several different items of apparent diviersity, 
I wish this were two years from now so that I could vouchsafe 
a guess this matter. My colleague George 'Miller would prob- 
ably a^pfe that there is nothing in diis, diat Inforniatipn capacity 
is probably not variable but is rather ,a matter of developing 
structures such that the Magic Nuniber 7 ± 2 is filled with purer 
and purer gold. *^ 
/ What can be said about the ''logic" that operates in thought 
during the sway of enactive, ikonic^ and symbolic representation? 
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' ]Her^ 1 am going t# b;^^ and say that 

, far more reieardi is iie^ would like to hazard a guess 

thai may serve for theft|i™it "as a working hypothesis,^ 1 think 
tfiat at die earliest enactl^e phase^ principles of organisation 
^^4re the-ditBiic laws #f fiyquency, recency, and proxi mity. Whgt^ 

''goes together" as a ittodel is that which has j^odivced receniy 
; frequent, and next^o responses. Probably Guthne's (1952) psy- : 

chology of learning or Pavrov*s*is the best descrlptiori of early 
, infancy i Inhibition at this stage de^ids upon stopping behavior 
. by setting up a. competing response. Learning is^slow, gradual, ^ 
.and statlsticah At the ikDnlc level, I would guess that the prln- 
clples of figure formation and perceptual grouping determine the 
/ manner In %hlch events are put together. It is the Ipgic of ap- 
pearan^s. The possibility of change dCjicnds upon perceptual 
/ reorganlzatloni getting things to look diirerent-whidi is swift 
and rather erratic in its efi'ects. It. would be foolish in the ex-., 
^ treme to assert that the rules of language usage or empirical ' 
logic or any other such thing dnmlnate the forming and reform- 
Ing of models of experience In the symbolic phase of represent- 
ation. For the fiict of the maUer Is that at this stage there is 
enprmoui. .flexibility. My guess abouWhe rules of thought ^ien\^ 
language takes oyer Is simply to remain moot and observe. 
What I rather gu^ss is tlicU it is here that instruction tecomes 
critically important. How the child uses symbolic representation 
in thought is partly a function of what ^havior he has turned 
back Upton to recode and upon the power and complexity of 
the rules ' that he ban learned to use in, this reflective prorfss. 

Man's history as a species suggest^ dial there have not beeii 
any interesting -and certainly no major morphological changes 
in man for some hundreds of thousands of years. As I huve 
afready suggested, lie has progressed by linking hUnself wldi 
outside systenis — evolution becomes jlloplastic rather than auto- 
niorphic, to use the technical language. Man's survival as a ^ 
species, then, depends upon his flexibility in using means for 
aftipHfying his imiscle poVrer, his scnses,%ind his ratiodnilive 
.capacitiesr As pA Medawar 01963) ^laj recently put^ it, evolu^ ; 
- tion after the invM»n of a linguistic tra^ion becomes Umiarc- 
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kian and raversible^^ut not a#^^\^^ termsLwere originally " 
f underwood. Evolution achi^es ^^^^:ilew status By virtue of opter- 
' aUng outside the geneti^pdde* *^'h^ further evolution of the 

apedeSj thenj depends upon the extent |p whithj lii the' develops 

lectual prosthetic devicea of the culture. If a given^ geherattpn 
• auccet^ wells then the nast generaUon climbs on its shoulder^. 
In an ironic vein, we can turn Haeck^d's foyinulation oh its head. 
Whf^^ human evolution ii. concerned, it is the caie that phylo-,' 
J'geriy rec^itulates ontogeny father thaayice versa. ^ 
- ^§ in mind let me sugges^ope furt^^ Rpihit. Might it 

not bp |he case that t^ unlockii^ of the huhian genetic code 
. (or that; part baving to do with intelligehce) depends upon the 
invention bf new amplifiers of human pdweri — new prosthetic 
devices,^ if you will? Because human evolution in the morpho- 
logical sense seems to have determined the species as to6l users 
(taals both hard and soft)j we shall never know the fijl. capa- 
city of mat! until tool-using reaches the highest -point J^L qan 
teach ^ and here I mean jhose most powerful tpoU of all, intel- 
lectual ones. But might it not also be the mse that the niost inter- 
eating :diing-Ab@ttt inteUectuar is that their principal use is 
not print-out into technology aloiipi but that tliey make possible 
the creation of or the mastery ol even more powerful tools? In 
^this aense^ evolution progresses by a^ system of prerequisites that 
ir quite familiar to the teacffir in us. a 
X itid then with the paradoxical note that what we know about ' 
hum^n growth suggesA that education and the trained use of 
jpind constttute ou? major agents fo^r further ^volutloiij (h a, 
group such Tas thisri can oi^Iy add one boint to make the con- 
"elusion directly relevant Orre thing that has i^ijot been sufTljiently -■' 
part of . the * objective of testing ^ to discer|[ ^hat the far limit 
of , man's capaclt^s is at^ ar^y given time -r^particularly during 
the Fast dewlopingk years Mf childhood, VygoUky (1962^ com- 
men^pd yeajrs ago that perhaps it woul^be a good idea if wjl^ 
tested in children the SQ-call^ "zone or po^itial intelligence 'V 
^how much a child can^n^Me of the best hints we might give^? 
him, the beat trots, the best ^ols, \he majUmunr theoretical props ' 
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a^/formulafe^ I am^ being deadly serious when I suggest'that 
rathW than toUng under neutral tonditioni, wettest under, the 
most opUmum conditions possible. To what extent, we should, 
be asking in ou?; tests, aire the schools and the other agents ^of 
afecaa^* uaUig diis child's ti^aciUes^Ouh^ 
of testing .too often asks only about aptitude;|Lnd . achievement, 
^Quld be delighted to see a year given over to teaching-and- 
/ teaflrig'and^teadiing»and-testlng t^ see how far children can be 
^broiight along die La'marckian way. The^?^ and only thep:^n 
tptlng serve us -with benchmarks of iiot wftjerie a child but 
where he is capable of foi>^. And when we have 'ftillyJexplQited 
whe^e Ae^e individual is able to go, then we will be In a posi% 
tion to eJihate where the species might go. . ^ ^ 
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Over the yearSj the types of individuals rfepresented at this con- 
fertile have devoted t untold hours to^xploring the nature of 
ability. From time to tiifie a new line of investigation dr a jiew 
research technique lias seemed' to promise a breakihrough to 
some sort of fundamental truth or understanding regarding the 
organizaHon . and development of ability. (Kornhauser's 1944* 
quqptionnalre to 79 specialists in mental tests founfl 55 more 
hopeful of research on separate intellectual tactors tha|i on m^s^ 
urement of ^'general" intelligence^ witli only fiye cofnmitted ^to 
the opposite yiewO other times it has seemed that evidence 
from different sources was conflicting^ if not contradictoiy or 
irrec©ncilable. The notion that "you get out of such a jtudy jusr 
. about what you put into it" has often seemed both true and 
discouraging for those who had hoped definitive findings could 
clear the scene of prevailing confuiions. 

Today, however, there seems to be emerging a possibility, of 
'reconciliation based on what might be called multiple or plur-' 
alistic truth. Physicists have learned to live .comfortably for a 
generation or more with the fact that S/onie of tlie behavior of 
light is well descrijbed by wave theory while other phenomena 
of this a;^ea are better explained by a view of light as the be- 
haviofr of Corpuscles moving under laws that fit the behavior 
of Billiard balls equally welL The reconciliation in our ffgld of 
mental ability Is following a hierarchical view of the natuire of 
ability. ^ ' 
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; ■ ^'rtfndW this view, atolbirtable chleny to Vernon, danfled 
^ M gteatly ^ Hu^hreys ' delineatlQn of the n-latlo.i littween orders 
^ ' of "fcetbrs andrlevels of the hierarchy, an interpretatioi} liHe 
■ ' - Ipearmah's o f a Md^ p rlriikry uitd ftctiye^gyr#r8<lWQU^ 
by sa^Uite uncbrrelated specific factorsv may-re 
■ at -the mosi giheral level : of copliifg Ihtellectually with tjie # 
mands of the environment. At a second level, intellectual, ability 
V is^ reprMented„ by the two constellations we have cbnle to cftll _ 
variously verbal and Jipnverbalf langu^e and jionlanguage, • 
or vert*^ and perfomance. ' ' J -1' , : 

' j '' A£ a third level, these, constellations may be subdivided further. 

^I^hat has been called verbal at the second .level subdivides into 
^vfcrbal and quantitative reasoning factors. (Perhaps the' verbaU 
categOTy at die' second level Is bettM caUed academic or scHola^^^ 
jic ability, to conserve a term, especially since the most direct 
derivation of "verbal" is from !^rd" rather dian from a more 
inclusive', source. ) At the samenime, the„,performance or non- 
. • Y«bal constellatlop- breaks down into spatial. - mechanical, per- 
cepjual factors. , . • j 

• At still otKer levels we find tl^^ primary ' mental abUlties o 
Th^irstone aAd his followers. At what.,wf should probabiy caU 
. th^- penultimate IdVel, we have Guilford's structUre ofrintellect 
with its' instructive taxonomlc uses. I say "peniAlmate'^because 
* one must conceive the possibiUty of still further refinement. To 
, turn to science again for a helpful parallen succesirvely ffner 
,. subdivisions of matter bdow the atoms that were pnce defined 
' ' as' the. ultimate units of matter have given Us-power' of whic,h we 
' -could: only have dreamed. ' If the structure- of intellect may be' 
• ' thought of as -the periodic table df abilities, vive les isotopes! 
: the catholicity of this viewpoint is even greater than has al- 
ready been suggested because it leaves room Tor different causal 
interpretations 6f ilie different orders of factors.. If one is lm- 
piesied as J. McV. Hunt Is by the analytical work of Plaget 
' regarding the development of general strategies of .thinking in 
' , children, by die sfjefcylatlons of H ebb regarding the development 
.. . 'of mtdlecf by sUmulaUon and flaboratlori of the central neural 
' 'processes that Intervene between the syhspry and motor, and by 
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; the logic of BCTguion, lie factbT-theorizing of Humphreys, and v 
the factoring", of grow by Holataetter, he may prefer the ^ ^ 
hotipn of : general WteUigfence adv and E. L 

: mouSj bui interactlrig ahd correlated akilli* The behavior at the 
gfneral level of au^ an IntelUgence yand Spearman's ^ will not 
be 1 atatistlcaUy, diffieMnUafed^ so both , may be accointnodated 
iintil morp fntfdiimental, experimentation can rdolve the issue. 
Onfe with this viewpoint may go as far as Piaget to dlsriegard 

^ ehttrely intwindividual trait variability and concentrate on Juit 
the general level in the hierarchy. He may tonsider factors of 
/ lower order Just relatively unimportant, as Hunt seems to. 
^e^ inay, like Huiflphreys, rejert the notion of factors as primary 
I meiital abilities and $itnply think of them in , descending ordCT, 

: ^of^ signiflqance; Or orie may go all the way with J. P. Guilford 
Jn^ scribing possible dynamic' significance to the factor^ of the 
structure of intellect v ;i 

Two personal coniments! Your speaker would concur In Chron- 
bach's view . of the importance of criterion-oriented tests like the 

r Diffierteqal Aptitude' Tests, whigh tend.4q jkll at the thira leVel ^ ■ 
in the hierarchy as I have 'described it. iCnd on Rirther review, 
\ ,beheve I may have Inserted that levfil intff a m^'del that other- 
wise njoved from the leveF df language-nonlanguage ^iirectly to 
the prtmary mental abiUties. For I - have always 'felt that, like 
the D.A.T., the Scholastic Aptitude . Test df the Cqllegu Entran^^ 
Exaniination Board is to be found at this leveL Verbal reason- 
ing ability and quantitative reasoning ability are basic a^cademic - ^ 
skills, relating reasoning power io two functionally distinct miedia 
in the academic .qurriculuni, They were identified by faAor analy- 
sis quite early, by , Brigham and T. L K el ley, and they ■ remain ^ 
hinctional ^unities. For school jurppses they ar& predictive arid / 

. more meaningful dian a scheme of three primary itiental abiHties: 
a large reasoning factor and two disemboclied^ factors of trivial 
skills of numerical manipulat^n and verbal association. Arid 
because they merge r^'a^rfing and relevant ^skills in power tests, 
they are predictive of achieve men t in definable sub- segments of 

' the academic curriculum. 



4hfe second comment may h 
^ semB^ttlic^bUt it seems algri 
srttiiv^fc ^ r^ement' of d*iS 




by aomi to be merely 
speak^. It has to 'do 
^e structure of intellect 
^t-of-rrfer€n€e-4est§- 



:jfep;-€Hpsamenters ; i new tests or 

ojlUr^m^^ tbe prleiary men^ 

tai abtlitier ahi the ^ruAure of" ^B—Bl igate skill in mathe- ; 
vpciaUcs to ja iow placf at table (^ri^H^ft?). something mote 
ctaracterUUc orb^ dian of ISEH^pnaUclans. Yet In the 

BiM^i 6f M 4e kit of vectd¥^§tir *'gener^' rey^n* 

ing abllHy, iff'^^resen^d by\A)ur te^U^^ of wKlGh ktfi m^the- 
imatical reasoning tests! It is q^uite po^^ft tp reach this pfosition 
>'lh^ adhwhce to factorial ^irfityl relation betwe^ tests 

, 6f ma^amatlcal re^ontiig<4^^ aS^^^third levd*of the hier^ - 
;/arAy to mathere^ of *^|enerar' reas^ng ability 

'' at penultlm^^^fe^ One Is tempted to ask 

wkch pl^^entVlnvplves ;the mo^ parsimonious deacripUon. 
; A cproUary feature of the hierdSchical model Is that If allows 
lower oraer factbrs^to'be used eiyier for their qWn sakes as slg- 
• "hiflcantr ^itte^ a^ level of understanding or 

'/using Aental ability fact6r€, *c^r as guides to properMlance in 
' the meS^m^t bf m|fttal ability factors ; of a higher order. 
^ Foif exairi^fc^^^^^ qiiantitatiw reaAoning factors^ that 

' emerged first from rudimentary nflthods iiisjhe years B, T. 
(before Jhurstone) werfe used by McNemar and associates in 
redressirig^ the balan^ in, the Stanfbrd-Bliiet/ Criticisn^ of tha 
1916 verilon. of that batteify Jiad intlnded the observation that / 
at its lower- a« ley els vprbal items and exercises predominated, 
vWhlle atthe d|^er levels' quite as ^reat a predominance* of quan- 
titatiw ;^tements b^ found/ Analysis by factor methods 

showed the extent of this "aisparity ^atistically and was used 
to guide thedioice of elemenU at all levefe^ in the 1937 and siib 
' sequent reviilbhs.^ The Stanford-Bin^ stdl yields a single '^core 
^ fbr generd mental ability, but j^th due regard for balance 
"widiin the sub-areas measured at the third level If there is any 
jquestlon raised now^ It is probably for failure to take adequate 
dccount of the two rtiajor areas at the second IdveU language 
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y^sus 'i^oalaa^agey us the Weqhsler meftsutes s do explicitly. . 
f; i;'to/ turn ifor thi mpmtat/ tp CMrjrent /trttethpids bf appraising 
a^l^ement in^ idioola anq colleges^ |etj Us '^ay tribute to the 
fu^lTTOpn tifl CO tion bjT Hftlpfi" Tyler and - asso ciates^ for . 
: ■ thelr^ WoA % the l93p'ii/ln ^breakm^ 

; f tadcs Jeiicyclbp^lc kiiQwlidge ^that. ehiiradterized the flr^t wave 
oE achi^ven^ept testa and battieries that' had been devf loped in 

^ the preceding d^ade} The companion fcohtribiition of item-styles; 
that measured higher mental processes through multipie-choicS^ 

■^'(oiros te^ re^dried; pt kqumK iignificaftce. By 1^4% cbnv ^ 

cbmitant flHinking lof Llri^quilt ha^ led ta^constructioh of the 

; lowft Tesb of Ediicatijnaj^^ ^mch were r#dy to 

serve the mportant purpose of = the United States Armed Forces 
' ' in,stitiite 'in iroasuring readiness of returning schqlar-soldiiers for 
coUege.. wprk qr, at least, high scbopl credit. ( It shpuld be noted 
hfere that earUer traces, of this approach were, to .be found in the 

" Iowa Eveiy-Pupil Tests of Basic Skills, for vgrades 3-9, and 
in the Cooperative Achievethent Tests being developed under 
Flanagan.) — . - ^ 

l^ie tJSAFI Tests of General Educational Development weie 
work-lirnit tests. With . their omission of time limits, they set a 
reaUstic miniature of the, school study sitriation and thereby pro- 
vide a yardstick against which 4n 5rf6sequent te|f building 
maiiy agencies could accept the concept of power tests with gen- 
erous time limits permittiiig many students to finish early. The 
time limit now became a means' of assuring most examinees 
an opportunity to give a#Nnuch time as ^necessary to complete 
the pDwer-graded materialsj rather than a uniform time in whiclf' 
to accomplish as .much materiar— often of only moderate dilTi- 
culty^as possible in a tline in which only the most competent 
and facile could i hope to finish, . t t % 

The trendy wd hailed of shifting the emphasib ^n achievement 
testing from memoriter knowledge to ability to apply such knqw- 
ledge has an opposite source of concern. In preparing tests to 
ascertain whether examinees can apply knowledgi, some have 
gone so Jar as to remove in large part any Requirement that the 
examinees draw upon a background of well-structured, import- 
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f. ant knowle^p in ,ani\^rlrig die questions poVel A^enerally^ 
/ bright persop widi ability to ' interpret verbal, quantitative arfd , 
/ graphic material may obtoi^ scorei^on such tests 

despite having failed po^ the^pfernatlc^^ 
field wa Increaririgly feel we ghpuld denfcid. , ' 

It Is 'pelevant to resall tHe experience of Kelley Md Krey In ; 
the American Hlstorrcal Asspciato^^ 1934 report on their study ' 
/ of the teaching of the social itudids. It had>^ Jntended to ^uild 
a test depending 'entirely eta a?billty to apply knowledge, but by 
' misfake a Tactu'al^ltem o^^^ Sepoy* MifOi^ had been left In 
the test. Wl|en the smny i^ms of this teji^ were correratcd with ^ 
total s«re/^e item stlowlAg the highest forrelation was th©^ one ^ 
* taadvprtentlyhnc^d. The explfti^ion given /w^ that^ the ; 
Sepoy Mutiny wa^aucb '^n impcnrtant incldmt in the histo^ 
. of Bfjltish ^Ipnial /administration feat better students, however 
deftned, wouli^ be bound to fe^nembgr it for that^reason. Our i 
' Ideal is, ^of ^iye, a test in which !l||th background and appli- V 
■ cation are required in,each item. ^ ^ < , 

The USAFI Tests . of General Educational Development aEFord 
a nituraj*bHdp to^our fhird topic ofrelaiifig abili^ JTp^rft^^"^ 
ancfe} It irth^ thesis of thte -pa^er that the hierarchical view of 
'mentiU Ability slated ftrs^ AdI ritsel^o an ecleptic approach to. 
the use of measjires in predicting achievement- in ;apy given sit- 
/uation. At no point was' ability defined Va^ inhwted or the - 
Y produet of inevitable proCtesses, of maturation, fo^ reason 
■ any /^asure i^pedictive «)f likjely success in any jparticularly ^ r 
defined ihtellectual arena is a^ appropriate m^iure of aptitufle , 
for that success. (Substitution of PL^, meaning Trobable Leanv t 
ing Rate, for in' a ^lumbar of school systems' has semantic 
pierlt.) In the case of the USAFI GED: Tests, ^ir use. with re^ 
* turhinl soldiers to predict fiti^ss for pollege study Vas enhanced 
by the extent to whith they suppressed the req^ireitrent of fnow- 
J ledge brdinarlly available in systeniatic fo^ froro-re|fe<t ad- 
vanced' study. In a situatlbn in which systematic knowledge from^ 
, tont study is available, <that^ may well be drawn upoa in 
^ting or through the evidence ijf school grades to . supplements - 
testbig diat dpes not require it. - * ' 
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*^The use of m^'sures of ability to predic^erformante has* fceen" 
( Subjected to* i^umfnating systematic treatment *by one , of 'this. 
L mo^ng's ipeakers^ Djr. /Thonidike! I l^ave ciifoyed the quote , 



attfilsuted to ^im In ^ news^ releases r^garSing. His receflt nlonov 
' gr^h^, on "The Cpndepts -of Ovi^aiid Underachievement*^ that J 
' the -term "underachtever" should be reserved for those who con- 
;ceivcd the terminology the firs| place. /AfteiV^ pOnderjing the 
logical falfacy implied in the terms, namely that if.|^ under- 
' adiiever is one who|: has donp less than he can do, an over- 
/afchiever Aust be one who' has done more- tha*u he Ccurdja, I 
. only suggfest the ^following subsUtu+rfs: (1) We Vre all 
* ' uridcrachievers^ onljiis^^t are more so; and (2) an dveracfiiev^r 

is^staply an, under-underachiever.' * ^ / . 

- The constructive view emerging from all of this wouW appear 
to be that particular learning situations place demands on pai;^ 
ticular combinations of intelUctual /blUti^j Uiat these abijkiei 
are ojLdtfferent orders ofvgeneTality m .the hierarchical structure 
ftnd t^t all measures of Ifcillties are measuries of achieveihen 
^ of th^ appropriate ordte of geheralityy As^ long a^o as^lQ^^TJ 
Bingham* stated tire case for achievemifcht ai the best predictor 
of further achievamfnt. Wesnian has Jiresenttd thD statement in 
' briefv persyasive ioitli. /&ur problem tliert woufd <ippear toi be to 
^' iise tha^ .pombj nation .of measured abilities mos^^ descriptive of 
aptitude^ i^. — most prediqtive in particular situation^,! ^ ^ 

It is therefore no departure fr^jm sound conceptualizc^tion' to 
propose appraAing reading comprehphsien relative to listening* 
comprahensio^t as has been done for yi^grs in^.Uhe Durr^- 
SuUiyari iRe^ding Capacity and Achie^nient; Tes^;^. Other pre- 
dictive helps become* appropriate in fecial situatiohs^. Generally, ^ 
^njieasurps of verbal merit al ability , at\die second^ or third lev el, 
v^fii br helpful Generally, ^nieasui^s pf performance nierital 



neraiiyj nieasuiys pi penormance nientai 
o helpfuL jrtf the ot^er haridi where special ^ 
' -like bllmguallsm/^ are involved, 'som^hing I 



^^iU^ wUl not be, sc 
T l^agtiage handicapsv 
* nonv^bdl, will have adyantage'b. . * * i. 

' In reviewiftc the^ manual for the current edition of the-Met^o- 
poUtan AchieYemeiit Teats .qj|Mtly, it was . interesting to , note 



that the m^sure proposed f^revaluating learning ptoi^htial* is 
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• a "comitoslte prognostic score" compoW chiefly or entireljr of 
■ • *the prwlous war'j ichievement lyeasures. Justlflcatlon ,.is givtn , 
: Wtft-ffas 6f4he gwater stobU lty of this fype of composite ov^ 
' ayir, (.90) than of a;well-regardecl grquttest of mental ability ' 
6ver the same* period (.801 Other studtesrof the same test by. ' . 
itafE members suggest ^e^ des'lrabUily , of- dlfrprent predictive 
•equatlon#H^or different ^ubje^, dopendlng on the sizr, of the cor- • • 
rtlaUon; ^hd even -on dlfTerent comblnaUoi^s of subjects, chiefly ^ 
. •dependent on the dlstl^ctlcji, between verbal and quantitative- 
' reasoning- ablUtles, of the:sdft dlsllnguished^the^hlrd level In ^ 
- the hierarchical model propoaea In this paper/ | " * ; 
' Affinal note, that dofs not fit well Into.thl gei^ertfl framework - • ^ 
bf tills paper^ deserves laentlpn. For sdnie time, the dem^id that ■ ^ 
*me4fsures of ability to write effefctive prose ^omposit^ be m- 
cluSed'.' in ?stahdardized test- batteries has been met with ^e . 
. ^^spo^se fro J test spffiialtsts- that such writing cannot be meas-. 
ured reirably - in the time ol-^narily allotted to standardized test- 
ing in schools or In external 'testllig programs. Some Tecent^ ^ 
V research*, ifadlcates that globaL ratiiig of pieces of writin| on^ 

* specified topick, M?ith the reUabUity enhanced as far as feasible^ 

* by' multiple rating, will pesmlt scores of ad«iuate reliability anAy,. 

' 'validity to be reported. We may well still counsel rfhat cumulative ' 
. evidence -of writing ability be obtained by Systematic evaluation 
4f weekly compositions; but' Jt seem^o be becking increasingly > 

* possible- to appraise, siich outcomes wlthiii 'test batteries and / • 
thereby give comparable emfihasls to this skill' outcome along ^ 

• with odiers ordinarily^pralsed by objertlve tests. ^ - \ 

^ ■■ ' ^ ■ % ■ ^ 1 ' 
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In this paper I will di^pis personality pieasuremem pfta^^^ 
iii terma of itsjotenlial^ntributioni to the'prediction of college 
performance. In 4liis context, two major questions arise: (1) Are 
perabnality tests ailj^ good as measures of the purported per- 
:sonality characteristics? (i) What should tlie|e tests be used 
for? The/first question is a scientific one and may be answered 
by an evaluation - of availabje personality instiumants again^ 
scientiflc mndards btpgycliometric adeciuacy. The Second ques- 
tion, is at least, in pli|t an ethical one and may ^le ahswered By 
a justification of proposed uses for ^ t^t in terms of ethical 
standards and social or educational values. I %v^l first discuss 
^he sdentiftc standards for uppraislng ^ersmiality measures and ^ 
will then consider how well thes9 standards are typioilly met 
by instruments developed by each of three r^ajpr approaches 
to perspnality measurement. The final section of the 'paper will 
discuss same of the ethical problems raised wheli personality 
measures are used for practicaLdecisions, . 



at 



*A prplimina^ iverSion W ^dmu pontons of this paper was prupard lor ^le 
Committee bf Hxamlnur. in A^|iTuclu*Mnng of the Collcgt. Eiurance ExamUui^ 
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auegesUons' about the nature of the problem^ and the organizatien of the 
Srial Gratehil ackhowledginent is also due Syddl Carhon, Norman 
' Frederiksen,.|ohn French, NaUian Kogffn, ^ and Lawrence Strieker lor their 
helpRil commS^ts an the manuscripL ^ , 
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Feraonallty JIlMSuramsnt ^ ^ ; ^ 

The'^'^ ifluj u jp stieES u rtffflBnt f eqrQirenients, in ' perrorotllty;^ a'^ j ji 
piyc^ology generally, IniVolve (1) the demDnstratipn, thrbi^h ^ 
■^anbstaQtlal c^nsistenp^ : of respbiftse 'to a set of items,^ that Some- ^ 
Wir^ iB beirig mAsvite^; and (2') !tlie^accuniulation*of eviden^e^ 

''^^out the nature and meaning of diis "somethings''^ in terms 

. of the network of the measure's r^atlons with .theoretically 
leleyarit ^Tatiables and iU lack of rclationf ^th theoretically un- 
related variables (Cronbach & MeebU 1955J* Loevinger, 1957; 
Bechtoldt, 1959; Camp&ll=& Fiske, 1959; Cani^elh I960; Ebel, ' 
1961). In psychometric terms, these two crilical^ properties for 
the evaluation of a ' purported personality measure are the 
Kiea sut e 's relta bili 0i a nd its co nstntct ^ validity. 

An inves*tlgation of 'the measure's relations with other well- 
known variables may also provide, a basis for determining 

. -whether the thing measured retw^esents a relatively separate di= 
mension with important specific properties or wheUier major 
t(ariaiice is predictable frpiii^ combination of other, possibly 
niore*basio, characteristic, Such informution bears upon the , 
status of the construct as a septirate variable and upon the. 
structure of its relatioiis with other variables, " * 

Whether the measure reflects a separate trait or a combination 
of characteristics or^ indeed, whether the proposed construct is 
a valid integfation of observed resp6nse d^nsistencief or merely 

' a gratuitous label, there Is * stijl another ilnportant proijerty of 
th® measure that can be independently evaluated^ namely, its 
usefulness in predicting concurrent and future non-test behaviors 
a^i^ pbsiible basis for decision makiiig and social action. For 
such purposes^which primarily^ include classification, and selec- ^ ' 
tion si^ation€, it is necessary that the measure display predictwe^^ 
validity in the form of suBstantial correlations with the criterion 

, nieasufes chosen to reflect relevant performances in the non-test 
domain. Although some psychologists would argue that such, ^ 

- predictive validity is al]^ that's necessary tp warrant the use 

' a measure in niaking practical decisions, it will 1^ maintaineeT 
here tha^ predictive validity is not sufficient and 4hat it may be 
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unwise to ignpr^ construct vAlidUy even in 'practical prediclicjji . 
. . problmis.(GullikseA^ 1950; Frederflcsen, 1948; Frederiksen, 1954). 

' .Tbis^pomt will be discussed more fufly latef. ^ ^ 

Just as a test has as ^niany empirical validities as there ate ^ 
criterion measures to, which it has been related, so too m^y a 
test display 'different proportions of reliable variance or reflect 
^ difife^ent construct Interpretc^ion^, primarily because the^niotiva= 

^ tiops an^ defenses of die subjects are imphcated in different ways 
' under ^fferent testing conditions, Thits, instead of talking about - 
di€ relitfeility and ^ construct validity" (or ev^Ti the empiridal 
validity.) of thG* iest per se, it ftiighfc be better 4t> talk about the 
. rfliability ,and constriict Validity^ of the responses-io the test, as 
summarized^ in a particular score,^ thereby empHasizing thcft.tliese 
test properties are relative to the prgcesses used by Che sub- 
jects in respondirig (Lennon, 1956). These processes, in turn, 
may differ uhder different circumstances, particularly those affecting 
^the conceptions an4 intentions of tlie subjects, Thus, the same 
test, for example, might measure one set of things if administered 
in ' the context of diagnostic guidance in a clinical setting, a 
radically different set of things if administered in the context of. 
anonyiiibus inquiry in a reaseurgh lubq^i^ry, and yet an^fr 
set if administered as a ^personal evaluation for industrial or 
academic -selection..^ Furthermore, these dilforent testing settings 
impose different etliiijal constraints upon the manner and con- 
ditions of eliciting ■ personal, and what the subject may consider 
private, infornmtiori ^(''Standards of Ethical Behavior, 1958''; 

' ^Cronbach, 1960), 

This point that personality tests, and even personality testers, 
may operate dlficrendy^nder diilerent circumstances was di|e 
of the main reasons l^iitially chose to limit the prus&trrtl^- 
cussion to a^ particuUir context = namely, pqrsonulity measure^, 
' meht in relation to ccjUege pcrformace. Various contexts difler 
somewhat in die types of problems posed lor persbnulity measure^ 
ment, but the timely context of itesessment for college contains 
nearly all the problems at once. Of major concern in considering 
this context, however, is the inherently evaluative atmosphere ol 
the testing settings. This means that we nuist take into account 
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not only the ubiquitous response distortions cjue to defense 
mechanisms of setf-deceptioir and. personal biases self-regard 
(rf. FrenkeUBrunswik, 1939), but also the distortions in per^ 
formance and self^report that are at least partially deliberate 
attempts at faking and impression management (cf. Goffman 
1959). 

^ The extent to which attempts are made to handle the problems 
of both deliberate misrepresentation and unintentional distortion 
becomes an important criterion for evaluating perionality in= 
struments, particularly for use in evahiative seuings. Many 
personality measures have been developed Jn researcli contexts 
where deliberate • misrepresenuition may have been minimal; 
little is known' of their psychonietnc properties under conditions' 
of^eal or presumed personal evaluation. Some personality tes^s 
Jndude specific devices for detecting faking, such as validity or 
malingering keys, which would enaWe students with excessive 
''lie" responses to be spotted and would also permit tlie use of 
the control scores as suppressor variables in correcting other 
scales (Meehl & Hathaway, 1946). Other personality instruments 
rely on test formats that .attempt to make hiking difflcult, sucli . 
^ the use' of forced'Choice technique^ on questionnaires or of 
€^'ect^p;€^erformance measures whefe the direction of faking is 
not obvious. Still other procedures use indirect hems and dis- 
guised fagades to circumvent the subject's delensive posturc 
(Ganipbell^950; Campbell, 1957; Loevinger, 1955). 

Psyohamefrlc Rreblams in Some 

Typloal A^roaches to Farsonallty Measurement 

We ha vF'Tnlisidered several psychometric criteria for evaluating 
persontility measures: reliability, empirical validity In predicting 
criteria or non-test behaviors, jhe striu:ture of relations with 
other known variables, the adequacy of controls for faking and 
distortion, and- more basic because it subsumes aspects of the 
preceding properties -^construct validity. We will now inquire 
how well these standards are typically niet by instruments deveh 
oped by three major approaches to personality measurement = 
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sel^report questionnaires, behavior ratings, and objective per= 
formance tests. . 



Before various types of self^repdrt queitionnaires are discussed, 
the aeneral problem of stylistic consistencies or response sets on 
such instruments should be broached (Cronbach, 1950; Jackson 
& Messick 1958). A major portion of the response variance on 
many personality invemorles, particularly thpsS with "True^False" 
% "Agree-Dlsagree" item formats, has been shown to retlect 
consistent stylistic tendencies that have a cumulative effect on 
presumed content scores (e, g., Edwards, 1957; Edwa«|s, Dlers, 
& Walker, 1962;1ackson & Messick, 19fil, 1962a). The major 
response styles emphasized thus far are the tendency to agree _ 
or acqulesci (Couch & Keniston, I960; Messick & J-^o"- 
1961), the tendency to respond desirably (Edwards^ 957; 
Messick 1960), 'the tendency to respond dcviantly (Berg, LJOO, 
Sechrest & Jackson, 1963), and, to a lesser '^'''^nt die tendency 
to respond extremely in self^ratlng (Peabody, 1962). These 
response styles have been conceptualized and studied as j^^ 
sonality variables In their own right (Jackson & Messick, 1958), 
but their massive Influence on some personality inventories can 
seriously interfere with the measurement of other content traits 
(Tackson & Messick, 1962b). The problem becomes one of 
measuring response styles as poteutially usehd persouallty vari= 
ables and ;at the same time controlling their influence on coiv 
tem scores (Messick, 1962; Wiggins, 1962). The extent to which 
controls for response styles have been w|fiectlve in reducing over- 
whelming stylistic variance tecomes^yan l.npurtant criterion 
evaluating the measurement characteristics of self^report 



In 

instruments. 



instrumem*. 

We will consider three kinds of self- report or questionn.vre 
measures of personality; (1) a type that I will a factonai 
inventory, in which factor analysis or some other cntenon of 
internal consistency Is used to select items reflecting homogen- 
eous dimensions,(Cattell, 1957; Gomrey, 1962); (2) empirically 
derived inventories. In which signlfkaut diflbrentiatlon among 
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criterion groups is the basis of item selection; and (3) rational 
inventories^ in which items are chosen on,^ logical grounds to 
reflect theoretical properties of specified dimensions. 

• -Factorial inventory scales are developed through the use of 
factor analysis ofodier methods of homogeneous keying (Wherry 
& Winer, 1953; Loevinger, Gleser, & Dubois, 1953; Henrysson, 
1962) to isolate diruensions of consistency in response to self- 
descriptive items, The pool of items collected for analysis usually 
cpnsists of a conglomeration^^ of characteristics possibly relevant 
to some domain and sometimes includes items specifically writ- 
ten to represent the variables under study. 

The most widely known of the current factored inventories are 
the Cattell 16 Personality Factor Qiiestionnaire and the Guilford- 
Zimmerman Temperament Survey. Becker's (1961) recent env 
pirical comparison of the Cattell questionnaire with an earlier 
form of the Guilford scales has repealed an equivalence bet^veen 
four factors from the two inventories and substantial similarity 
ior two other. factors. Although considerable factor analytic 
evidence at the item level generally supports the nature of the 
scales (Cattelh 1957; Guilford & Zimmerman, 1956), when two 
subscale scores were used to represent each factor supposedly 
measiired by these inventories, Becker (1961) found only eight 
distinguishable factors within the 16 R and only five widiin 13 
Guilford scales, ^ 

These factorial inventories were developed primarily in research 
"^^Settiiigs, so that aUention. must be given to possible defensive 
distortions induced by their use in evaluative siUiations. Although 
procedures for detecting faking have been suggested, their 
systematic use has not been emphasized, nor has their eflective= 
ness-been clearly demonstrated. Further, empirical controls for 
response styles have usually not b^etrincluded, although their 
operation has recenUy been noted on some of the factor scales 
(Bendig, 1959; Becker, 1961), ^ * 

• In the construction of empirically derived inventory scales, 
items are selected that significantly discriminate among criterion 
groups. The most widely known examples are scales from the 
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Minnesota Multiphasic Personality Inventory (MMPI) and from 
' the California Psychological Iriventory (CPI). The justification 
of these scales Is In terms of their empirical validltj( and their 
usefulness 'm classifying subjects as similar or dlssUnllar to cri- 
twion groups. Scale homogeneity, reliability, and construct 
validity are seldom emphasized. The difficulty arises when these 
scales are used not to predict criterion categories but rather to 
make inferences about the personality . of the respondent. This 
latter use has become the typical one (cf. Welsh & Dahlstrom, 
1956), feut such application ainnot be justified by empirical 
validity alone- homogeneity and construct validity become cru- 
cial unyer such circumstances (Cronbacb, 1958 \ Jackson & 

Messick, 1958, 1962b). " ' 

Because of their widespread use in cllnicnl settings, consider- 
able attention has been given, to the problem of. faking, partlc. . 
ularly on the MMPI. Several scales are available for detecting 
lying and malingering (L, Mp, Sd, etc.), along with a validity 
scale (F) for uncovering excessive deviant responses (Dahlstronv 
& Welsh, 1960). A measure of "defenslveness" (K) is also used 
both as a means of detecting this tendency and as a suppressor 
variable for controlling test-taking attitudes (Meehl & Hathaway, 
" 1946) Several studies of the effectiveness of these scales havfe 
indicated a somewhat variable, and usually only moderate, level 
of success (cf. Welsh & Dahlstrom, 1956; Wiggins, 1959). 
. A major problejn on the MMPI and CPI is the predominant 
role of the response styles of acquiescence and desirability, which 
in the former instrument define the fUst two major factors and 
together account for roughly half the total variarice (Jackson & 
■ Mes.slck, 1961, 1962a; Jackson, i96th Edward.s, Uiers & Walker, 
•■ 1962) Presumably, these response stVles are correlated with the 
criterion distinction utilized . in the eiiiplrlcar scale construction 
(cf Wahler, 1961), but their massive influenee on these m- 
ventories drastically interferes with the attempted measurement 
of other content traits and limUs their possible discriminam 
validity (Jackson & Messick, 1962h). 

• EaHonai invenform comprise Items that have been written 
on theoretical or logical grounds to rellect specified traits. That 
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such scales* measure something is demo^trated subsequently by 
high internal consistency ^coeflicients; that they measure disffii- 
guishable characteristics is shown by relatively low scale iiatqr- 
^rrplations. Factor analysis is^ rdso sometimes used subsequently 
to investigate scale interrelatioils (Sterti, 1962). On some of these 
inventories, such ^ Stern's * Activities Iiidpx, little attention has 
been given initially to the ;^ole of response styles^vhile on others, 
sup^ as' Edwards Per5onaLPi;eference Schedule'(EPPS);the major 
^ attraction has been the attempt to limit stylifitic variance. ' v 
The EPPfr employs a forced-cli^icciteni format: statements are 
presented to the subject in pairs^ the members^f each pair hav% 
ing been- previously selected to be as equal as p6ssible Jn average 
Judged desirability, Jim respondent is required to 'select from 
each. pair the statenieht that beUer describes his personality,^ Such 
forced-choice items do not offer opportunity for the response 
style of acquiescence to a||trate^ Further, since the paired state^^ 
menJs are alsp approximately matched in desirability, a con- 
sistent tendriic^ to respond desirably should in principle have 
relatively litde effect upon item choices (Edwards, 1957; Corah 
et ah, 1953;, Edwards, Wright,. & Lunneborgf 1959):^^ Even 
thaugh desirability variance is not elinilnated dieieby, primarily 
because of the existence of consistent personal viewpoints about 
desirability that cannot be siniultaneou.siy equated (Husen, 1956; 
Borislow, 1958;^^ Heilbrun & Goodstein; 1959; MessicW^- 1960; 
LaPointe & Auclair, , 1961), tlie fbreed choice approach oflers 
considerable promise for reducing the overwhelming influence 
of response styles on questionnaires (Normap, 1963b). Unfortun- 
ately, the EPPS can still not be recommended for other than 
research purposes- heaiuse insnffkient evidence -exists concerning 
its empirical and construct validity ( StriSker, 1963 ). 

The difierent approaches to seale construction that distinguish 
factorial, empirically derived, and rational inventories might well 
be' combined into a ^single measurement enterprise, 'wherein scale 
homogeneity,^ Construct validity, aid the theoretical basis item 
content, as well as empirical differentiation, would Ixr successively 
refined in an iterative tycle (Loevinger, 1957; Norman, 1963b). 
In this way the differences among the approaches, depending 
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as tliey would upon the parficiilar point in th'c cyele that one 
choie to start with, would become trivial, ai^' scalfcs would be 
systematically divelopbd In ternfs of Joint criteri^i of horn ogerieity, 
theoretical relevance, construct validity, and empirical utility. 

■■HAVIOR LATINOS ' ' ' t 

Behavior ratings represMit a second ' major approach , to per- 
sonality measBj-ement. ^Ire^t ratings of -^hf vior, both of job 
performance and of personality charact^lstics,, have been fre- 
quently emrioyed; In' educational an* Ifidustrlal evaluation 
( Whlsler & 'Harper, 1962), Personality mtiiigs, however, liave 
seldom been formaUy'or systematically, used fn the typical selec 
tion sltWtion for many 'reasons, one^of Uiem beln^Oie dUficulty 
of obtaining reliable or comparable ratings for cancyrfates coming 
from different sources. However, if teacher- and peer.ratlngs of 
personality made In college, for example, were to prove vtilid in 
predicting behavioral criteria of college succe.s|N(cf, Tupes', ISSTJ- 
and if these ratLngs could, in turn, be. predicted by. other meas- 
uies (sucb as self-report inventories), then the predicted ratings 
might be useful in pre-cbllege decisloris. Behavior ratings that 
correlate widi college success could thus serve as intennedlate 
criteria for validating self-report measures of the^ame dimensions. 
' Cattell (1957) has isohited approximately 1^ dlmenslons%bm 
: behavior ratings, 'rbllecting such qualities as ego strength. ex= 
Slialjilit^ dominance, and surgencv. Tapes and Christal (1961), 
on tiiTother hand, In analyzing ftliAsamc rating scales and m 
a few casfas the same data, provided evidence for only live strong 
and recurrem factors, which were labeled extroversionf agree- 
ubleness, cMScientiousness, emotional stability, and culture (see 
also Norman, 1963a), Cattell (1957) has also claimed a con- 
gruence between most of his Ijehavior rating factors and their 
questionnaire counterparts, which suggests that questionnaire 
scales can indeed predict rating dimensions, CatteU's claim ot a 
,6ne-to=one'm64chiftg of behavior rating and qJestlonnaire factor 
hS been challenged by Becker (1960), howevtr, who concluded 
1 that available evidence dld ^i^Lsupport the alleged relation, 
, Norman ( 1963b ), on the other hand, Jias clearly demonstrated 
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that quesUonnaire scaks be developed that Will correlate 

substantially with behavior rating f^ors. In his particular study, * 

attempted^ |to predict the five ratiftg factors obtained by Tupes' 
and Christal ( 19585 "1961^) from ^eer nominalfons. Since these 
ratings had previously ^exhibit^ substantial ^validity In predict- 
ing officer effectiveness criteria at ^the USAr officer candidate 
school (TupeSj 1957) , the subsequent prediction of these ratings 
by questionnaire scales has direct iniplieatioris for selection. * 

Incidentally^ Norman's (1963b) scale eonstrueti©^ procedure 
involved a'n extremely promising .technique for handling faking 
in evaluative settings. Items in a ^rced^ehoice format^ equated 
,-for "admission^o-OCS desirajbility '' #ere administered under nor- 
j;mal and faking instructions. In the cuhstruction of the scaleiij 
die items were' balance between those showing a mean shift 
under faking instructions* in the direction of the keyed response^^^ 
and those showing a mean rfhift away from the keyed r^ponse. . Y 
Mean scores for the resulting scales were thus ecjuated uivdlr ^ 
normal and faking coi^itions^ and, in addition, powerfifl de- ^ 
4ectioti scales were developed to isolate extreme dissemblers* , T ^^^^ 

OMIOTLVE PBRPORMANPSS TMT« ' 

The third major approacii to personality measuTCment consid^ed ■ ^ - 
here is the objective performance test. Acc&rding to Campjl^ll 
(1957.), an objective measure of personah'ly, like ^^in» objective ^ 
measure of ability or achievement, is a tcsi. in which the .examinee 
believes that he should respond accurately l3ecause correct J 
luiswers exist as a basis for evaluating his performance. Cattell 
'-^1937X oil tlie other hand, considers a test objective if the sub- 
* ject is unaware of the manner in which his behavi&r afTects the * 
^coring and interpretation, a prop^ty that Campb|ll J 1957) 
prefers to use |fi the definition of indirect measurement. 

Qatteirs (1957) analyses ofrobjectivt performance measures 
.^f personality have ^ uncovered a^pproximately 18 dimensions, 
with such labels asjb^mc assertivei^ess, inhibition, anxiA^^, and 
. critical practicality. Thuf^tone (1944) and Guilford (e. g., 1959) 
have also defveloped m^mres of several perceptual and cognitive 
dimensions that represent, objective tests of personality. Measures , ^ 
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.of speed and Qdtfey of cl^aiJhlXThurst4ie, 1944), f6r txample/'/' 
i) ^.^fl of ideatlen J fluency app|«|^e ctAiienial In .a^ personality ^ , 
\\ . framework tlia| in tlip^ tradlt^BpiUty forhiulaticji (cL Cattell<^ , 
• ,1957; Guilford- :1959;' WltkiPra.,» 1962). Some of Guilford's^ 

(19i9) Forft dn divergent thihkiag also -A-als witl^^ stylistic^ ' . 
restrictions lij pief jeneration and manipulation of ideas, which j -» 
appear as much "like Vrsonallty ionsisteiicies as measares . o| ■ , 
' ''maximum perfotniance'' abilities (Cron^afik^^ , '„^Mf %,/ 

" ln .'m|ny cases, • the. objective nfeture of thesjj tests miikra; it 
^ difiicultU^' dec ide how ' to fake, siiice'sQme look, very-iirticlf like ' • 
* . ' ability. ftSts^d .appear , to have ^clear adaptive requiremehte/^ 
.. - ' that IMbjects- should .strive to' achiely;. Test properties, howe^^ef, 
f hayfe tew AidiBd^prlin^rilo^ in rcscan:hic(^^ 
' , faking ■ may have been minimal. Certaiii ' characteristics iiiay 
change under other conditimisj^^vailable objective tests also tend 
' ■ to be''unreliable,Vri»iWily because' they hav0 been deliberately 
kept, short fo> use in large if st batteries* Because of practice and 
order effects 'on some of tl|-~t5roc"cdu res, however, there js no 
guarantee that high reliabilities can be abtirined simply-,by length- 
ening the tests, 

' C()nsldetablc attention has been |lven in recent years to certaui ' 
stylistic' djniensipffls in the perfornuina-'of cognitive tasks (Witkin 
. . et al., 195i; \v(tkin ^t aL, 1962; Gaidncrx-t al., 1959; Gardner, 
. . Jackson, & Messlck, 1960). ITliesftyie^sonality dlnieiisl^MS have 
' ' been concep^alized as cognitive styles, which represent person's 
typical modes of perceivlng,-jremembering, thinking, and proWem- 
solving.- Approaches to thciineasuremt^t of these variables have . 
routinely included objeotW procedures. Some 'examples -tof these 
dimensions a^e ( i)/icld'depcHdence i>ific/jend^^^^ 
in^ntrast to a global, way of- percciviiig (whichy cnUuls ^ 
a tendency to experience items as discrete irom their backgrounds 
and reflects ability ^tg^erconit: die influenca of an embedding 
context" (Witkin et al., 1962; see also Kagan, Moss, & Sigel, 
1963; Mcssick & Fritzky. 1963); (2) leveling'sharpenmg~& di- 
mension where subjects at the leveling extreme tend to assimilate 
new material' to an established framework, wliereas sharpener.|^ 
at -die other extreme, tend to contrast new material with the old . 
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^aiid to*^aintain distinctions (Gardner et al, 1959); and (3) 
^categor^width pre/grmceSfr- 3. dimensipn' of individual consist- 
encies iy^mod^ of categorizing perceived Similarities and dit 

*ferendes, reflected "in con^stent preferences for^broad qr narrow 
categories' iq conceptualizing '(Gardner et al/5Tl959; tSifdner & 
Schoen, 196?; petUgrevy,a958f Messiclf & Kogan, 1963ii; Sloane, 
^rldw, ^Jacksoii, 1963),' . 

■ feoth the cbgnitive nature and the ^stylistic nature of these vari- 
ables make thera^ appear parlicularly r^^^ant tp the kinds of 
cognitfve tasks, performed in kcademic seUings, Certain types 
of subject matter and certain^^rGtjleins or problem formulations 
might favoy *lfroad categorii^rs oyer ''narrovv aifcgurizcTs, for 
example,.or levelers.qver sharpeners, and vice versa. Thijs ''vice 
versa ' = ' is extremely Important: \siHce it isnuilikely that c|iu%end 
of such stylistic ^*dimensions wouW^prove uniformly more adt^> 
tive than the^otherj the relativity of their value should be recog- 
nized. (Inpidentally, the possibility of A\ch^ rckkivity of *yalue 
might well be extended *to^ father personality variables where the 
desirability one ei\d* of the trait has usually bei^n p re-Judged, 
What' conceptiohs would change^ for example, if ''flexibility vs. 
rigid ity " ' h aiTiee n caf 1 ed con'fu s fo u v s . c o\uror 7 ) 

It is quite possible that we 'have alrejidy unwittingly included 
s uqji s ty 1 is tic v a^fap ce in some m ea s ures of i n tel 1 ectu al a p ti tu d e^ , 
i sucn as the SAT, but it this Is die case, tVicj^nature and dire^ipn 
of its operaticyi should be specified tind controlled. It is possible, 
fo r exam pi e^ th a t t li e fi v e-a 1 ter 1 1 a ti ve m u 1 1 i ple-cl i o 1 ce for ni o f c\ ii a 1 1 - 
titatlVe aptitude items niiglit favor subjects vvho prefer Ijroad 
cut/gories on category-width measures. Quick, rough approxi- 
mations to the quaiititative Mkuiis niight uppropriately be judged 
by these subjects to be "'close enougJi" to a given alternative, 
whereas ^narrow range" subjects may require niore^inie-consum- 
rag exact solutions before answering. Si^iificaiit qgrr^llitioris , 
between category preferences and quantitative aptitude tests have 
indeed been obtained and have been found to vary widely as 
a function of the spacing of alternatives on multiple-choice forms 
of the quantitative aptitude tests (Messick & Kogan, 1963b), 
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Th« Elhlcs ol Selaetlan , • 

\' In considering' personality measures of 'potential utility in ti*| 
^ i^luatjye context of college performance, I have tried to give 
W»;'iS^fesslon that many measures are,a^411able but none Is 
S^V^Jfii^ when system'atically' evaluated against psychometric 
u'^n^ta^aiis!; Hr addition, T have trlfid tg give some indication of 
N^'^'tiiWta^ilry |ad,Vai>cing technology that is evolving iir personality 
'"'^meSluremil^ support researcl^ efforts. In -the rdatively near . 

■fututef ihis technology may produce'' personality measures that 
' y are 'acceptable by measurement and prediction stahdards, so 
* /that the 'question inay stfon arise in, daVnest as to the scope of 
their practicar application. We hate considered somfe of the 
sdentlfic standards fbr dedding thell^pprbpriateness, but what 
. about the ethical ones?^ . ' ■ ^ 

The choice- of any particular personallty^nieasure for use, lay, 
in college admission involves an implicit value judgment,. which,, 
at the least, should be made explicit In an educatipnal policy 
that attempts to Justify its use. One compelling justification for 
using personality measures in college selection wpuld be to 
screen out extreme.' deviants. CoUegea would be well advised, for ■ 
. exan^ple, to consicrer rejecllng assaultive or suicidal psychotics, 
and some schools might wish to eliminate overt homosexuals. 
The use of personality measures foi: differentiating among 
normal subjlcts might also be Justified in terms of empirical 
validity. After all, as long. as. there are. many more candidates 
for admission than can be accepted, it seems ,better to make^ 
selections on the ijasis, of valid measurus than qn the basis of 
chance. But is empirical validity enough?. Validity for whatP 
Certainly die role of the criterion In such an argument must 

be clearly specified. ^ 

The relevant domain of criterion 'performances 'should be out- ■ 
lined and attempts should be made to devtlop appropriate 
criterion measures. Since different criterion domains can be 
defined'^ lbr'jdiferent aspects of college" success, selection might 
be oriented toward' several .of them simultaneously or toward 
only a few. Consider some of the possibilities: In selecHan for^ 
accMemic performance, criterion' measures might Include global 
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grade-p04nt averages, separate grades fo^ dltTerent subject- 
matter fields, or standardized curriculum achievement examina- 
tions. In selection for college environtneni^ criterion measures 
could 'be set up in terms ot desired contributions to extra- 
curricular college lile (such as football playing and newspaper 
editing) or in terms of bsflancing ^eographic^ social class, sex, 
and, perhaps, temperament distributions in the student body, 
"If the demands 'and pressures of the college environment and 
social structure have been studied^ criterion standards might also' 
be sgecified for selecting students with congenial needs tliat wilH 
fit well with (an#hopefully have a higher probability of being 
satisfied by) the cSftge jenvironment (cf Stern, 1062). We could 
also tgilk in l^ims ' o{ ^ selection for liltimaie ccircer satisfaction 
and selection for desixcible perso?ial characteristics (cf. Davis, 
1963) — or for desirable attitudes, ^ 

In' each of these cases, it should be emphasized that poteiitial 
predictor measures are not evaluated in terms cff their enipiriGal 
validity for criterion behaviors but rather in terms of their 
prediction of criterion measures, which, in tlirn, are presumed 
to reflect the criterion behaviors oT interest. And these criterion 
measures should be evaluated against tht same psychometric 
standards, as any other Measures. Not only should they be 
reliable, but also the nature of the^ aUributes measured should 
'be elucidated in a construct validity Ircunework (Dunnette, 1963), 
Since each of these criterion measures niay also contain some 
specific variance that is not particularly related to the criterion 
behaviors 5 one should also be concerned . that an obtained 
validity coefficient reflects a correlation with relevant domain 
characleristics and not with irrelevant variance incidentally re- 
flected in the putative criterion lueasure^ Thus, the questlt)n of 
the irlirinsic validify of the predictor and of the criterion measures 
should be broached, even if in practice many of the answers 
may seem presumptive ones (GuUiksen, 1,950). In the last 
analysis, ultimate criteria are determined on rational grounds 
in any event (Thorndike, 1949). Should a reading comprehension 
test predict grades in gunner^s mate school (Frederiksen, 1948)? 
Should a college that found docile, submissive students "receiving 
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Higher grades in freshman courses select on this basis, or shciuld 
*=they consider Rising their gradlnp system? Such 4ecisions 
might becorne more difiFicuU if the. penpnah:ty characteristics in^ 
solved had more socially desirable tebel s.'^^ 

.Just as we have ^bqe^ concerned about ^predicting grades not 
j as ttfey are but as diey should li (Frederiksen^ 1954; Fishnian, , 

1958), so too^sl^uld we^ be concenied not only with predicting 
' personality characteristics that are^prestn|ly^ considered desirable 
. ^ for college students 'btit also^with deciding which characteristics, 
if any, should be considered desir^ble.Jt is possible, for example, 
■ that c^ertain prepotent values, fugh as the desire for diversity, 
would override "decisions to select students in terms of particular 
personal quahties. The very initiation of selection on any giyeh 
personality variables might lead to conformity pressures toward . 
the stereotype implied by the selected cliaracteristics. Apart from 
the effects of die selection itself, such pressures to simulate desired 
personal qualities would probably decrease diversity in tlie col^ 
leg^ environment and ir^ the personalities f)f the students. Wolfle 
(1960) 'and* others have; emphasized the' value of diversity and 
even the value of Uneven acquisition of skills within Lndividtials 
as important contributors^ to the optimal develoR^ipent of talent. 
Restrictions upon diversity, however subtle, should therefore Ijc 
^ undertaken cautiously,^ • . 

rd like to close metuphoriaiUy with a story of the lineage 
of King Arthur. At the end of the second Hook of The Once and 
Future King, T. H, White, points out that Arthur s half-sister 
bore him a son, .Modred, who was his ultimate downfall, that 
oiv the eve of die con^ption Arthur was a very young man 
drunk with the spoils of recent victory, that his lialf^sister was 
much older thai^ and active in the seduction, and that Arthur 
did not know Mii the woman was his sister. But it seems that 
''^in tragedy, innocence is not enough/' And hi the use of pcrsom 
ali^y measure in college admission, empirical validity is not 
enough. 
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I have: an ujieasy feeling that some of the things that will be 
said in this tairol lhe' social consequences of aaucational test. ' 
ing may be regarded as somewhat controversial. Let me try 
to begin, therefore, with some statements on which we may all 
be able to agree. , 

Popularity and Critlciism 

Tests have been used Increasingly in reMnt years to make educa- 
tional assessments. The reasons for this are not hard to discover. 
Educational tests of aptitude and achievement greatly improve 
thfi precision, objectivfty and efficiency of the observations on 
&ch educational assessments rest. Tests are not "alternatives 
to observations. At best they represent no more than refined and 
systematized processes of observation. ■ . 

But the increasing use of tests has been accompanied by an 
increasing flow of critical comment. Again the reasons are easy 
to see. Tests vary^Jn quality. None is perfect and some may be 
quite Imperfect. Test scores are . sametlmes misused. And even 
if they were flawless and used with the greatest skill, they, would 
probably still be unpopular among those who have reason to 
fear an impartial assessment of some of their competencies. 
Ntariy of the popular articles critical of educational testing that 

- ' have appeared an recent^years do not reflect a^y adequate 
understanilng of educational testing, or a very thoughtful, un- 

- biased consideration of its sqelai conscqueiices. Most of them 
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ara obvious potboilers for their authors, land sensational reader- 
bait^ in die eyes of the .aiitors of the Journali in which t^ey 
kppear. The writep ^f some of these artidies have paid courteous 
'^^^^vWMW Wff '^^^ TTiey Have listened rgspectfuUy to out reeitals 
of fact End opWion* They have drunk coffee with us and then 
taken thebr leavCj presumably to reflect on what they have been 
told, but in any evwit, to write. What appears in print often 
seems to be only an elaboration and documentation of their ini- 
tial prejudices and preconceptions, ^ supported by. atypical anec- 
dotes and purpos^uliy selected quotations. Educational testing 
has not fared very well in their hands. ^ ^ 

Among the chttges of malfeasance and misfeasance that these 
critics, have leveled against* the' test milkers there is one of non- 
feasance. Speciflcallyj we are chafed with having shown lack of 
proper concern for the social consequences of our educational 
testing. T^tese Harmful consequences, they have suggested, may 
be"; hum^erbus and serioUs. The more radical among them' ihiply 
that," because of what they suspect about the serious social con- 
sequences of educatiorRil testing, the whole testing mdveraent 
ought to be suppressed. The more moderate critics cl^ini; that 
they do not know much about these. social conseqiWices, But 
they also suggest that the test makers don't eithjer, and that it 
is the test makers who ought to ''be doing substantial research 
to find out. . ' ^ J'^ 

Th# Role of RMeareh ^ ^ 

If we were forced to choose between the two alternatives offered 
by the critics, either the suppression of educational testing or 
extensive research on its feoGial consequences, we probably >vould 
choose the latter without much hesitation. But it is by no means 
clear that what testing needs most at this point is a large pro- 
gram of research on its sociaT consequences. Let me elaborate. 
Research can he extremely useful, but it is far from being a 
sure-fire process for finding the answers to any kind of a ques- 

_.^tlQJi,.-pA^£uL&riy & 3 ocial q ueatio n , th a t perplexes us. Nor is 
' research the only source of reliable knowledge. In the social 

_^ sdMceSr at least, most of what we know, for sure has not come 
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buil of forn^ researfch projects. It has cdme instead from the , 
InfeaUon If a very large flumber of more or less Incidental 
*Oibs»ttft^^l*BA amounts' o m natural, rather * 

than exprfeSlT sltiiaUo^^ good reasons; why ^re- 

search oil h^nan >havlor tends to be dlfficuh, and often unpro= 
ductlve, but that Is a story we cannot go into now. _ 

For present purposes, oAly two points need to be mentioned . 
The first Is that the jqarcity of formal research on ^e social 
cons^u^ces of educatlbnal tesUng should not be taken to mean 
that there is no reliable 'knowledge aboutnhose conseqMences, or 
that those engaged In educational testing have been callously 
indifferent to its social consequences. The second is that scientific 
resear^ on human befiavlor may require commitment W values 
diat are in' basic conhlct with our democratic concerns for indi- - 
vidual welfare: If boys and glr^s are ysed as carefully controlled 
experimental subjects in tough^mlnded research On social issues / 
, diat really mj^ter, 'not all of them wai benefit, and some may V 
, be disadvantaged seriously., Our society Is not yet «ady, and 
perhaps should never become ready to acquiesce in that kmd 
'of scientific research. 

Harmlul Cens«quonco8 

Before proceeding further, let us mention speclflcally a few of the 
harmful things that critics have suggested educational testing may 
do: ^ / 

•* It may place an indelible stamp "of Intellectual status -super= 
lor mediocre or inferior -on a chUd, and dius predetermme 
his' social status as an adult, and possibly also do Irreparable 
harm to his self-esteem and his educational motivation. ^ 

• it may lead to a narrow conception of ability, encourage 
pursuit of this single goal, and thus tend to reduce die diver- 
sity of talent available to Society. 

• It may place the testers In a position to control education 
and determine the destinies of Individual human beings, while, 
incidentaUF, making the testers themselves rich In die proass. 
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. f It jriay ^encourage Infiparaonal, Inflexible, mechanistic pro- 
cess^ of evaluation and, determtnation, so that essential 
human freedoms are limited or lost altbgether. 

These aye four of flie meSt frequent and serious tentative in^ 
/..dlttme^, Therie have been, of course^ many otlier iuggestions^ 
'of possible harmful social consequences ^f educational testing.^ 
It may emphasize IndlvldtSal cbmpetltlon and success, rather than 
..__iOClal , cgtfperatl^Qnj conffict with the cultivation of dem- 

^ ocratic ideals of human 'equality. It may foster conformity rather 
" than tteativUyj It may involve cultural bias. It may neglect 
important intangibles. It m'ay, particularly in the case of person- 
' ality testing. Involve unwan-anted and offensiva invasions of 
privacy. It may Vdo serious 'injustice In particular individual 
' cases. It may reward specious test-taking skilly or penalise the . 
lack of }t. , ^ / 

If time and our supply^*bf ideas permitted, it would be well 
for us to consider all erf tMese possibilities = But. since tliey do not, 
perhaps the demands of the topic may be reasonably well met 
if we Umit attention to the' first four items mentioned as possibly 
harmful consequences of educa|lonal testing, namely; 

• permanent status determination ^ 

• limited conceptions of ability ,. 

• do^matlon by die testers 

• mechanistic decision making , c, 

. At this point in the presentation, a major choice must be made. 
Shall we explore the foundations for these apprehensions and 
attempt to dispel them? Shall we, in other words, attempt to re, 
futfi the allegations of harmful social consequences of educational 
testlngR Clearly most of tliese social dangers can be, and prob- 
ably have been, exaggerated. Little solid eyidence exists to Justify 
^£ fears that have b^n expressed widi such apparent concern. 

Or shall we assume that the concerns Which have been ex- 
^prgssM AW jnot Shall we^ therefoi^, jet^as our 
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task die- discovery aiifi deliiieation o£^ things diat might be done ■■ 
by those who makr and use tests to limit the causes for coit^ 
^ ^ .. 9^P? 0n J^^ that for one speaking to a group ' 

^ " of s^iaiits; iri^^ucattona tesUng, the second course of action 
, was deArly die more reasonable, and would be likely to be the 
' more usefuL So diat is die course that has been diosenr 

.PMWANBNT STATUS ^rriRMirlATIQN w . , 

... Consider first, , then, the danger that educational testing may 
place an indelible stamp of inferiority a child, ruin his\ self- ^ 
esteem and educational motivation, and determine Jiis s©dal 
' J / statos as^an adult. The kind o£ educational testing ^most likely 
^ jtp ha^ve thesie consequenees would Involve tests purporting to 
measure; a person's permahent general , capacity for learning, 
^ ^ These are die intelligence tests, and the presumed measures of 
general capacity for learning they provide are popularly known 
= asIQ;s, 

Most* of us here assembled are well aware of the fact that there 
is no. direct, unfcquivocal means for measuring permanent gen- 
eral capacity -for learning. It Is not even dear to many of us 
that, fi^^e state of ^ our current understanding of mental func- 
. * tions* arid the learning process, any precise and useful meaning t 
can be given to the concept of "permanent general capacity for 
' learning." We know that all inteiligence tpsts now available are 
, direct measures only of achievement in learning, including learn- 
ing how to learn, and that inferences from scores on those tests 
^- to some native capacity for learning are fraught with many haz- 

ards and uncertainties. 

But many people who are interested in education do not know 
this. Many of them believe that native intelligence has-been clear- 
ly indentified. and ,is well understood by expert psychologists. ' 
They believe that a person's IQ^ is one of his basic, permanent 
attributes, and that any good intelligence test will pleasure it 
with a high degree of precisioiff They do not regard an 
, . .1 , simply^as anp test score, a%core that may vary considerably 
depending^ on the particular test used and^he particular time 
_1 _ . ^^l^gn Q^e person w ^ ' 
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Whedier or not a parirto^dearning is sigiiificandy influenced 
by his predetermined ciipacity for learning, there is no denying 
the^ obvious fart that individual achievements in learning exhibit 
cdpslder^bje cohsistencjr o^er^t^ and across tasks. The super- 
ior ^eleniMtary school pupil may become a mediocre secondary 
sdiool pupil an^ an iriferibr college student, but the oddS are 
against it. E4rly Ipromise/is not alwaya fulfilled, but it is more 
often than not* The A stndent in mathematics is a better bet than 
the C student to be-an A student in English literature as well, 
or in social psychplogy. . s 

On the other hands early promise ts not always foUowedl by 
late fiilflllmjint. Ordinary students do bjossom sometimes into 
outstanding scholars. And special talents can be cultivated. There 
is enough variety in the work of the world so tha! almost any- 
one can discover some line' pf endeavor In which he can develop 
more skill than most of his fellow men. 

In a free society that daims to recognize the dignity and v^orth 
of every indiyidual. It is better to enj^hasize the opportunity 
for choice and the importance of effort than to stress genetic 
determinism of status and success. It Is better to ^mphaske the 
diversity of talents and tasks than to stress general excellence 
or> inferiority. It is important to recognize and to reinforce what 
John Gardner has called "the principle of niuhiple chances," not 
only across time but also, across tasks. 

The concept of fiKed general intelligence, or capacity for learti- 
ing, is a hypothetical concept. At this stage in the development 
of out understandLng of human learning, it Is not a necessary 
hypothesis. Socially, It is not now a i^ful hypothesis. One of 
the important things test specialists can flo *to improve the social 
coni^quences of educational testing is to discredit the popular 
conception of the IQ, Wilhelm Sterns the German psychologist 
who suggested the concept originally, saw how it was being 
overgeneralized and charged one of his students coming to 
America^ to "-kill tlie IQ^'' Perhaps we would be well advised, 
even at this Tate date, to xenew our dTorts to carry out his 
wishes, .fe- 

— Recenl- emphasis on flie early identification of academic talent 
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mvolvei Similar tisJs^ of overs|^^^ 

overemphaaMng its predetermined components. If we think of 
talent .mainljr as soniething'that is genetically given, we will run 
stfio^ pM diEterena^ than If we think of it mainly as 
'30n?e^ingdiat^n beeducatipnally develo^d, 

If human experience, or that specialized branch of human ex- 
pefience we call scientific research, should ever make, it quite 
dear that differenqea among men in^ achievement are largely 
due to genetically determined differences in talent, then we ou^t 
to accept the flndirig and restructure our society and social cds- 
toms in accord'with it. But that is by no means clear yet, and 
the structure and ciAtoms of our socidly are not consistent with 
such a basic assumption. For tlie present, it will be more consis- 
tenrAyiBfir Wla as we know them, and more constructive for . 
the society in which we live, to think of talent not as a natural 
resour^ like gold or uranium to be discovered, extracted and 
reflned, but as a syntheUc product like fiberglass or. D.D.T:^ 
' something that, with skill, effbrt and luck, ^n, be created and , 

produced put of ^erally available raw materials to suit our 
. particular needs or fancies. t . 

- ' this means, among othe? things, that we should Judge die 
value of the tests we use not in teniis of hoW accurately they 
enable us to predict later achievement, but radier in terms of how 
much help they give us io- increase achievement by motivaUng ; 
and directing the efForts^1)f students and teacKe^^^ ^ 
of view, those concerned with professional educatton who have 
resisted schemes for very long-range predictions of aptitude for, 
or success in, their professions have acted wisely. Not only is 
there Ijkely to br much more of dangerous error than of useftil ^ 
truth in such long-range predictions, but also there is hnplicit in 
the whole enterprise a deterministic conception bf achievement 
that is not wholly consistent with die educational; facts as we ^ 
know diem, and with thfc basic assumptions of a democratic, 

free society. ; ' » 

Whenever I try to poinf out tliat prediction is not the exclusive, 
Tdr even the principal purpose of educational measurement, some 
of my best and most mtelligent friejids demur firmly, of smile 
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politelyNto comiAunicate that they will never accept such heretical 
ni^iense^ When I imply that they uae the term ''prediction" too 
loosely, reply that I conceive if too narrowly. Let nie try 

'splice ftibrfe fdTOhteve a, mieUn 

I SLgrm that pr^iction has to do with the future, and that the 
future ought to be of greater concern to Us dian thct- past, I 
agree, too, that a measurement must be related to^some other* 
measurements in order to be useful, and diat these relationships 
providie the basis for, and a]^ tested by, predictions. But the|e 
relationships also provide a basis, in many educational endeavorsj 
for, managing outcomes — for making happen wlfat we want td 

"happen. And I cannot agree that precision in language or clarity 
of thought is well served by referring to this process of control 1- 

^ihg oufcomes as j u sf a no ther ins ta nee o f pred icti on. Th e ety - 
mology, and common usage of the word ''prediction'' imply to 
me the process of foretelling, not of controlling. 

The direct, exclusive, imm^iate purpose of measurenient is 
always deScriphon, not either prediction or control. It we know 
with reasonable accuracy how things now stand (descriptions), 
and if we also know ^ith reasonable accuracy what leads to. what 
(functional relations), we are in a position to foretell what will 
happen if we keep hands off (prediction) or manipulate the 
variables we can get our hands on to makeji^ppen what we 
waqt to happen (control). Of course, our powers of centrol are 
often hmited and uncertain, just as our powers of pred^tion are. 
But i have n6t been able to see what useful purpos^is served 
by referring to both the hands- off and the hands-on 0]pferations 
as prediction, as if there were no importaijf difference between 
them. It is in the light- of these semantic considerations that I 
suggest that tests, should be used less as ba^es for prediction of 
achievement, and more as means to incr^se 'achievement. I 
think there is a difference, and that it is importafyi. educationally. 

LIMITBD O6N0B^|ONS OP ABIUITV 

- i *^ ... - . - ■ - ^ ' ~ ' 

Consider next the danger that a single widely used test or test 
batte^ for selective admission or scholarship awards may foster . 
an undesirably ' narrow conwption of ability and thus tend to 
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redutt dtvirsity in thfe^lents available to a school or to society, 
^ gere^again, r it^see^^^^ is not whdtly imaginary.^ 

= V Bi^lc^as. yerbiX an^^ skills are to many phases of 

rfucatibhal achfc^^ do not encompass all phases of 

achievement. The appUcatlon of a common yardsttck of aptitude 
or achtevement^ to all buplU iaoperaUonaUy much simpler than 
the; use of a' diversity df yardsticks, designed to measure^ different 
aspects of achievemeni But overemphasiB on a common test 
could lead educators t^ rfegiect those students whose special tal- 
ents lieputiide the common core. ; ^ 

Those who* manage prQgranij^Jbr ;^he testing of scholasticv 
■apjgjple always' insist, ahrf properly so, that scores on t^se 
tes^ shouldj n^.be the sole consideration >vhen decisions are 
made on admission pr the award of scholarships. But the ques- 
1 tion of whether the testing itself shWld not be varie^rom 
person to person remains. The we of optional tests of achieve- 
^iixent permits some \variation. Perhaps the range of available 
options . should be m&de much wider than it Is at present to 
accommodate gteater diversity of talents, ; 

The problem of encouraging the developmtat of variou^-feuids 
of ability is, of course, much^ ttoe^er than the problem of test- 
ing. Widespread commitraeiVt ,io^ general education^ with the 
requirement that all students'' stuflyMdentkal courses for a sub- 
stantial part of their programs, may be|^ jpudi^ greater deterrent 
[ of specialized diversity in the educationali prsdii^. Perhaps these 
, requireqients should be restudied too. y - j 

^ ■ ' ^ ' ■ . ■ * ^ ; 

D6MINATI0I^ BY THE TBSTERS 

' ■- ■ ' * 

^^H^ What of the concern that the growth ^of 'educational testing may 
-:j[ncr€ase the influence of thq test makers ^jttil they are in a posi^ 
/ t^h to cofttror educational curricula aiTd^lAermine the destinies 

Vpfitudeii]^?/.:^ - ^f' r 

N ^ . :Thdse ^hb^know will bow tests are macfe an^ ^ed in American 
eltjc^tibit knoiy thatAhe tests, more ^ften lag thaij. fcad curricular 
vqli^nger^i^^ihav while tests m^f aflect partiouldt episodes in a 
'^^ii^ can hardly evei^ be said to determine, 

a studint"^ destiny. American education is, after aft, a manifold, 
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r \ A^nitBX^ orgaiiize^ eritefpr^fqt^^ it restricts" 

*L-^tudCTt j^SoilP^o mych or tod. little-ii >iiibjiect for lively 

_ w^batei vB^ any atu» 

' llemlajiwtiny, not riearly ^ cldse as did ^xaM 

In some other cduntf ies'i ancfeht 

: But test makers have, I fear, sometlrhesV given tjie general 
jpUj^hc reason to fear thM we may be. up td'qb good. I reft'r to 
jpur sometime reluctance' to take the layiiui^ fully into our eonfi- 
dmcer to share fuUy ivith him all ; oup^inlb^^Moiti ato 
teat scofesj the tests fibiH which they Xyere iierived, and our 

^toerpreUition^^ of what they mean. ^ -. 
p. ^ Secrecy concernihg educational tests and fest/a^6jds1iiSybeen 
JuiUlted on^ several gf odhds. One i| that' tlm iii^orma^^ i^y 

■ simply too complex for 'untrained rpinds:U5 grasp. Now It is^ tr^ 
that some [uretty elaborate theories; can be buili aTOuiid our'ttist- ' 
Ihg proaiifes. .It, is also true, that vvfe can perfcptrni ,sohi^^^ 
fancy. stati^ticaWSiani^uIaUons 'vvUh tlie sdoM they, yield. But' 
the esseht i al infer hia tip \\ rev ea led by "the s co ^es b ij nl bs t ed u cit- ' 
tlanar tests is not particularly ■eoniplex. If we understjind it bury , 
laelves, we can commuftijcate it clearly to most laynieivwithput 
serious difficulty. To^^be quite candid, we are not all that niuch 
brighter than they are, niuch as wp jnay sometimes ne^^d the 
reassurance of thinking S9, ' 

i j Another Jus dlication fof^ secrecy is' that laymen wilhmisuse test 
scores. Mothers may compttre scores over the back fences.. The 
^Jr .rpne whose child scores high spreads the word around; The pne 
'.^%^hose child scores low iM^y keep tli6 secret, but seek' other, 
.^found^ for urging chaiijgis in the teaclVing.stafi or in the edu- ' 
<i^Uonal' program. Scorei . of liiiKted,/ nie^inrng may be treated 
p wl th u^du e respect a nd lise^ tp rep a ir or * icx i n j u fe the s tud en t 's " 
p seM-este^i rattt^ flian to epntflbute to hft learning, 
' i^gairifU is ttu^ that Ust Jcores Can be misused. They have, 
ife^n jn^ the past and tlie^fwill be in the future. But does this 
^j^lily :S0trecy? Can we mirainize abuses due to ignorance by 
^"■ *^j;ithhpldteg knowledge^^ flatter our^ellpw citizens when 

kell lhem, In effect, that ithei^ are too i^iiSt^^ or too lacking 
^aiQfer to be truited W the knowled|^^ their childreiH 
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or of themselves, that we possess, 
Seldom acknowledged, but very persuasive as a practiaj^^^ 
" reason for secrecy regarding test scores, is that it spun:s those 
^who use the scoresJfroM iWaying to exphiin ^aiicl Justify the 
. decisions they make. I^rence is not, and should not, always 
be given to the person^hose test score is the highci if score 
information is withheld, the disappoiined applicant Will assume 
that it was because pf his low score, n6t because of soine othen 
factor^ He will not trouble officials with demands for Justification 
of a decision that, in some cases, mights be hard to ju^iify. But 
all things considered, more is likely to be gained in;thp long run ^ 
by revealing the ob]ectiv& evidence used in reaching a decision. 
Should the other, subjective considerations prove too diflicult 
to Justify, perhaps they^ught not to be used,^as part of , the basis ' 
for decision. ^ f 

If speciahsts in educational measurement wlftit to be: ptfoperi 
understood and trusted, by the public they serve, dicy will dp^ 
well to shun secrecy and to share with the pnblfo M#.ti^uch as 
it is interested in "knowing about the methods thej^ u^e, the 
^ knowledge they gain, anc^ die interpretations they make. This 
is^ d^rly the trend of opinion in examining boards and public 
^^^ion .authorities. Let us- do what we can to reiniorce the 
trend> Whatever mental measUrenientj^ art so esoteric or. so dan^ 
gerous socially that they nmst be shrouded in secrecy j^^pbably 
should not be made in the first place, w 

'pfiTtest^s do not control tKlucation or die destinies of indlvi^ 
dual students. By the avoidanec of iny|itery and secrecy, they can 
help to create better pMblic understanding iind support. 

MEOHANISTia DEOISION MAKING ^ 

Finally, Irt us consider briefi^die possibilUy^thate testing may 

encourage mechanical decision .making, at the e^^iK^nse of essential 

human freedoms of choice ancf atition, 
those , who work with liiental testis often say that the purpose 

of all nieasurement is prediction. They use regre^ision equayons 
^Ao predict grade point averages, or contingency tables to predict 
^ the chances of v»fious degrees.of success. Their procedures may 
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seem to imply not only that human behavior Is part of a deter- 
ministic system in which the number of relevant variables is 
manageably small, but also that^the proper goals of human be= 
havior are clearly known and universally accepted. 

In these circumstances, there is sonle danger that we may for- 
get our own inadequacies and attempt to play God with the lives 
of other human beings. We may find it convenient to overlook 
the gross inaccuracies that ^plague our measurements, and die 
great uncertainties that Bedevil our predictions. Betrayed by over- 
confidence in our own wisdom and virtue, we may project our 
particular value systems^to a pattern of ideal behavior for^all 
men, ' 

If these limitations on our ability to mould human behavior 
and to direct its development did not exist, we would need to 
face the issue debated by R Skinner and Carl Rogers before 
the American Psychological Association some years ago. Shall 
our Jcnowiedge of human behavior be used to design an ideal 
culture and condition individuals to live happily in it at what- 
ever necessary cost to their own freedom of choice and action? 

But the aforementioned limitations do ^xist. If we ignore them 
and undertake to manage the lives of others so that those others 
will qualify as worthy citizens in our own particular" vision ol 
Utopia, we do justify the concern that one harmful social c6n= 
sequence of educational testing may be mechanistic decision 
making and the loss df essential human freedoms, 

A large proportion of the decisions ailecting the welfare and 
destiny of a person must be made in the midst of overwhelming 
uncertainties concernirig the outcomes to be desired and the best 
means of achieving such outcomes. That many mistakes will be 
made seems inevitable. One of the cornerstones of a free society 
is the belief that in most cases^^it is^better lor the person most 
concerned (o make the decision, right 6r wrong, and to take the 
responsibility for its consequences, godd or bad. 

The implications of this for jedn^tional testing are dear. Tests 
should be used as litde as possible to impose deo^ions and 
courses of action on others. They should be used as much as 
possible to provide a sounder basis of choice in individual decl- 
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sion making. Tests can be used , and ought totbu used to suppurt.. 
rather than to limit human freedom and resppnsibUity, 

Oeneluslpn 

In summary, we have suggested here today that those who make 
and use educational tests might do four things to alleviate public 
concerns over their possibly adverse social consequences: 

/i. We muld emphasize the^use of tests to improve status, and 
de-emphasize their use to determine status. 

% / We could broaden the base of achievements tested to recog- 
nize and develop the wide variety of talents needed in our 
society. ^ 

3. We could share openly with die persons most directly con- 
cerned all that tests have revealed to us about their abilities 
aiW prospects. 

4, We could deCTease the use of tests to impose decisions on 
other^j and instead increase their use as a basis for better 
personal decision making. 

, When Paul DresSffl^ad a draft, of this paper, he chided me 
gfinlly on what he considered to be a serious omission, I had 
failed to discuss the social consequence^^ of noi testing. What 
are some of these consequences ? 

If the use of educational tests were abandoned, the distinctions 
between competence and incompetence would become more diffi- 
cult to discern. Dr. Nathan Womack, former president of the 
National Board of Medical Examiners, has pointed out that 
only to the degree to which educational institutions can define 
what they mean by competence, and determine the extent to 
which it has been achieved, can they discharge their obligation 
to deliver competence to the society they serve. 
If the use of educational tests' were abandoned, the encourage- 

- meht and reward of individual ellorts to learn would be made 
more dlfficulti Excellence in programs of education would be- 
come less tangible as a goal and less denionstrable as an attain- 
ment. Educational opportunities would be extended less on the 
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basis of aptitude and merit and more on the basis of ancestry 
and influence^ social class barriers would becoiiie less permeable. 
Decisions on important issues of curriculum and method would 
be made less on the basis of solid evidence and moTf on the 
basis of prejudice or ^price. 

These are some of the social consequences of no/ testing. In 
our Judgment, they are potentially far more harmful than any 
possible adverse consequences of testing. But it is also our Jiidg- 
mentj and has been the theme of this paper, that we can do much 
to minimize even theie possibilities. gf harmful consequences. Let 
uSj then, use educational tests tor the powerful tools they are 
with energy and skill, but also with wisdom and care. 
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jANEBA, HUGO B,, Rutherfbrd ( Ncw Jersey) Senior High School 
jARECK^ WALTER H,, West Virginia University 
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J AspH^j NATH^/Ntw^york University ^ 
jOHNSON> BE^iAorr^i^i Mtwark 

j.0HNs5f^ Knowftp.n ■ 
,lj6HP0<^i RloifAiiib, Hu^^ ^ 
' ' \ iP^"^ lYiM y M ^nlviility of North C arqllna 
t/'V J ^^Ei^ ARvp'E./ Michigan Staw 
' ^V ' KA^Aci^ coLpiE K., The City College of New York 
^- kammAn> JAM^ r;j Univtrs^^^^ 

itARA%' SHAWKV^/j'EdueaUonal Testing . =r ' 

"'KARi^.MABri^TiE; New York City Board of Educadpfi ^ 

KATHLEEN^ siiTER MARV , CollegedfSL Elizabeth 

KATZ, MARTIN R., Educational Testing Service 

kATZEU^ Nf^Si RAYMOND A.J Nation 

KA uwM AN, EtEONpRA J, J The University of Chicago ^ 

KEiXH^ REv^END i^EGORY, St. Aiiselni's Cblleg 

KELLEY, PAUL R,, Natlohal Board of Medical Ex aminers 

KELtEY, IL PAUL, tJniversity^of Texas * 
\, .^i*^ KEiLEY, MRS, iL PAUL^ Austin, Texas r 
i' KEiiY^ E,' LOWELL j University 6f Michigan 

K^PAU^ LORNE M.j Educational Jes ling Service ^ ; , 

KB^DAu^ w. E., The Psychological Corporation " r 

KENNEDY^ i. Mm Texas Tech nologic^i. Col lege 
\ ^ KENNry, HELP* J,, HasfvarH Uriiversit V, 
y . '7 fto^rcLARENCE LL, Virginia.State Departnient of Education 
V ' KrasTiNO, ETHEL fc/ Educa:Uana^ Testing Service 

j^Kii^i^Ai^j iiAiiRijd^ Institute of Technology 

'v KiBrirER, joHN 1,, Harker preparatory School, Potomac;.$l*aryland 
, KiRKPATRic^ FORRfai H.^ Wheeling'Steel Corpora ■ 

KLiN^; WILLIAM E., Baltinidfe County (Mit^land) Puhlic Schools 
v V KL I rAEDER j ok Rm Educational Tes 
' KOCH, j^Hivi ia , jR?, Madison (New Jersey )' High School 

KOG AN, LEQN Art D s,j Brooklyn College 

kooan/ NATflANi Edycational Testing- S^p^^ » 

KOpB, R^aj^p Vc Albert, N^tlbnlil Catholic Educational Association 

im I GSM ANj RtiBEN^,; «N^ 

KRuo^ ROBERT'i^i^Atri^icah Institut ^ . 

KujAwsKi, CARi/j^i'ffie Atlantip Refin(ng Company \, 
KUj^pfSKY, NORMA, Ncw^^rk State department of Civil Service 
RUNOWKY^ solomo^4^w V^k State Departniertt of Health - t . 

TrtTrnif^^^^ll Ijjll ilOlji on^ Service 
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KVi^^ ^^KMAN p., New York State D^artme nt of EducaUon 

MtatWf^ AsfeciatiQ n of Collie s far Teacher 

iJukiriiT, jANi 1.1 EducaUpnid 

LANi^ HubM The University 0^ V 

LAN!| swmiAM s., Vasho£^Islahd High |chooh B^urton, Washington 

LAUDAN, BOfmiiS., Educayonal Testing Sdi^ f 
LAViN^ MARiAnn Jfjltho (New Ifcrk) HigK SchopI | ; 

biANA M.i Educational Tasting Service |^ i , ^ . ^ 

y iMMKUHi, CARLTON B,, B^tQn College ^ 1 

^^^lON^ ^RTQN R.j InteriiaUonal Business Machines Corporation 
LiiOHfoi^ 'p.^L,, EastStroudsburg State College, Pennsylvania 
iAnoh Robtai T:, Harcourt, Brace and World, Inc. 

LEVINS ABRAHAM i.. Office of Naval" RfcsearqH, Washington, D. 
' LEVINS HAROix* G., New York Stite Department v . 

^lEviN^ MiLTONr.National Science Foundation , 
' UBERMAN, SAMUEL QjWmbia University 

' Lf LLT, ROY s,^ Educational Testing Service 
LiNDBER^ LUCiLE, Queens College 

LiNDEMAN, RICHARD H., Teachers College* Colunibia University > 

LiNDQ^uiSTi E, F., State UniversOT^flowa ^ 

LiNDVALi, M., University of Pittsburgh 

LiN!^ . FRANCES R,, Cheltenham Township (Pennsylvania^^^ 

LiNTOii LINDA, New York City Public ^ools * 

LOHNi^ PAUL R,, State University of Ne^ York at Buffalo 

LONft LOUIS, The City (^llege of New York : 

LONC^ WILLIAM F., Unit^ ^ates Ai]j^porce • 

' i LbRE^ M, RAY» University^of Alabama 

^ LORif'^ PCTER G., Educational Testing Se^^e 

LORETAN, JOSEPH o,. New York City^Board of Education 

- LOTTED JOHN J., State Unlversiy of NeW^York at Geneseq 

^ LOWERY^ zEB A., Ruthcrfgrd County (Nonh Carolina) Board of Education 
LUCAS, DIANA D., Educfeoi^l Testing Service 



:^ ^ifMA^ Mritelf J tjy of ClnclnttdU _ • , 

"""^^"^ ^ 

LYONi^ wiU.tAM Xj, New, York State Ptpartmant df Edocation 
"kAc/BAiNj ROBM Tm l%e Torringto^ 

^wAu^ OE^^ f., Stati CoUfge of WorcestiernMjasiachu setts 

J ■ MADDI, SALVATORl Rm EducaU^^ , . / j 

MADD03^ cu^ordRm Cedarville College > ^ 
\ r MAi^ MiHt>N ».^^^ Testtng Service 

MALCbuij DONAm J. j Educational Testing ^ * 

M^ONiv/ DANIEL jJ, New York State D^artment of Education ^ 
mKhkowicz, RivaiNp walter a., Sacred Heart Seminar)!^ Detroit, ;^ 
Z Michigan 

MARRi^ JOS ™ E., Unit^ States MiUtary Academy ^ 
■^^ i^fit, JAM^ v.i Educ^tion^J Testing Service , ' ^ 

UTA^ bionoE L., Univeriit5L of Maryland ' ^ 
. ^mAson, ANDREW Mary land. State pepartmeh to 

MAssiVi WILL Jm University of Maryland 

MATH|wsON, ROB^f H,, Tlw City University of New York ; ^ ' 

. * M AVER ^ MARTIN, New Ybrk City 

MAVFiELD, EUGENE Life Insurancc Agency Management Association 
' MC CALL/w. c.^ Universify of South Carolina . l 

Mb cj^t^t FORBES E., McCunn 3^sociates^ Pbiladelphia 

MC cARTHY, DOROTHEA, Ford ham University 
< MC CONNELL^ JOHN c, Windward Schobl, White Plains, New York 

MC C0RD,^Ri CHARD B.j Philadelphia Personnel Depariinent ' 
■ ; MC CULLERS j WAYNE M.^ New Yofk Gity Conunuriity College 

MC DAiyiEL, SAR^H w,^ Hofsfra University ^ 

MC DILL* THOMAS H-j The Westminister Sclioola, Atlanta^ Georgia ' 

MC GUI RE, CHRISTINE, Uhiversity of lUfnois 

MC GuiRE^ josE^H^ New Jersey Sjaie D^artment of Civil Service 

MO IP^IRE, PAUL H,, Boston University , 

MC KEEi MICHAEL o.. United States GoVerriftient w 

MC kenna^ mXe Rm Crosby High School, Waterbury, Connecticut 
; . MCKENZIE, FRANCIS w^ Board of Education, Darierij Connecticut 

MC lOiON, JAM© J., Educational Testing.Service 
T MO LAUGHltN^ KENNCTH F., Lfnlted States Office of Education 
7^' MC LEAN^ LOLiE DAVID, University of Wisconsin 

. m6 MA^^ LEO F,^ jR J, Cleaver Com p any Executiye Institute * ^ 
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MaMUj^iN,\HOM^ 1., University of Penrisyjvania 
Mt NULTY, THEODORE F., EducaUonal testing Service 
•MP wm^ BUGipiAMc^Nlasaachmetts Genial tj^pital 



. fffsSf Mills, Harvard University 

^ MC TiUiNA^qi^Aij^ROY 1., State Univ^^ 
^ : MQ^I^^DONALD M., The City University of ^ew York 

^ ' * ? MittRANOl louRDii M., New York Unlvelsity , 
Msroi^ Jfh^Wf Naaonal Institute of Educationi New 

i^. MiLv:^^ s. DONAi^, K^^catiorial Testing Service .^^ / 

. -\ ipmNYi^ cHARLOTri Liw, Brooklyn, New York ^ 
^ ^ MERRtTi^ noBwMLt T., ^Ucgc Entrance Examination Board * 
h^iii^^i^ JACK c.. University of Mi^ - 
# Misiici^ sAmuel J. j Educational Testing Sery 

M%HiLi, GiNEjtjnitgd States Naval Training DevU^ ^ 
A-'^ MiDDefDOR^tORNAf Roosevelt Junior High School^ Westfleld, Newjeraey ^ 
MiLE^ Nfti'H., Falls Church (Virginia) High School . 
MiLL^ BEN F# TnJ The Psychological Corporation a : 

- MtiJ^rifc oosoTHY F., Clayton ( Miss o i . 

Vmillir, hoWAbd North Carolina State College 
MiLiiRj 1. JOYCE^. New York University ^ 
4 mTlle^ MARIAN fi., Delaware State Department of Public Ir^truction 
* M I PAUL vanr'; JR., Educational Testing Service 
MiLUviANj JASON, Cornell University 
# M I Li^ D^NAmF/, Ed ucatlohal Testing Service ^ 
MiRKm LOuiif, EduQational Tpting Service \. 
MiTCHEM^ BLYTHE Harcourt, Bracc and Worlds Inc, 
* 'MiTZEi^ HAROLb E,, Pennsylvania State Univ^trsity 
MOHNACi^ ANNA, National Catholic Educational Association 
MOHNACs, MARY, National Catholic Educational Association 
yMoLLENKOPFjNWiLLiAM..G., Proctcr and Gamble Conipany 
"i^OR^ MAX! NE R,, Educational Testing Service ^ . 
MORGAN,= HiNRY H., The Psychologlcai Cc3rporatioh V 
Mori ARTY, DORIS, Educational Testing Service 
MORRisoft alekander'W., Polytechnic ^nstituie 61 Brooklyn 
MOSELY, RDSsELi^ Wlsconsirt S^e Liepartment of Public InstrucUoh ' , 
MUKR^jE^ MRS. o,, Windsor Mountain School, Lenox, Massachusetts 
MULRY, JUNE, Indiana University . 
MURRAY, VIRGINIA Em Educational Testing Service 
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MYK^ qHMt^ TM EducaUon^ Testing Sen^lce 
MYmSf ISABEL BRiGGs, Swarthmore, P^niylvania 

'mtgANj 6.YMB f ik R Statfji Offlcg o f Jieal thr^dttcatiQB^and - 

^^iLSo^ ms i.j Universi^ of Soud^ 

HtLSONj H. ROBW, Highland Park (New Jersey) High Schbol 

NEVi!^ MMGAR^ H., Educational Testing Senrlce K 

HOI siu^ ETHEL R., New York State Department of Civil Service 

NOLAi^ DAVID M,, Educational Testing Service 

NOLL, vicifOR Hi, HaTcourt, Brace and World^^ ? 

NORA^ BiBtm MARY; i.i,N.D., National Catholic Educatlohal Association 

NORTH ROBERT D., Educational Records Bureau 

NORTON/ ELI ZABErHi Teachers College^ Coltimbia University 

NoRtONj MARGARCTj HopeweU, New JerseJ ^ \. 

NOSOWi SI OMUND, Michigan State University 

NULTV^ FRANe IS Educational Testing Servicfe ^ 

OHNMACHTj raiD w,. University of Maine 

oiLLj a C.| New York City Department of Personnel , 

o'keef^ JOHN J., Science Research Associates 

OFPENHEiK^ DON B.| Teachers Go^ege^ Columbia University 

ORAHOODj ELizABmfi Kansas City (Missouri) Public Schools 

osmAM^ iLizABETH, New York State Department of Civil Service 

ORL^N§^ JOSEPH B.^ George Washington High School, New York City 

©R^ DAVID B., American Iristitute for Research 

OSCARS ON, DONALDj The Taft School, Watertown, Connecticut 

OSGOOD, STANLEY w., Houghtou Mifflin Company 

OTIS* a ROBERT, Cailfomla Test Bureau^ Fulton, New York 

oiTOBR^ FRANCES M., Educational Testing Service 

owiNib ROBERT 0., State University of New York at^^Buffalo 

FACKARD; ALBERT G,, Baltimore City (Maryland) Public Schools ^ 

FAcEf ELLIS B., University of Connecticut 

FALLRAND^ GEORGE j,, Prlnccton University 

PALMER, ORMOND Michigan State University 

FALMiR^ ORviLLE B,^ Educational Testing Service 

FALU BINS MAS, ALICE L., Tufts University * - 

FAF^As, ANGELiNE J., Horace Greeley High School, Ghappaqua, New York 

FATTIRSOH susAN^ 5jew York University 

FATOic^ SISTER M., CLF., Cardinal Stritch College 

PAYN^ DAVID A., Syracuse University 

FELiKAN; fH¥£Lis K., New York University 
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VERRY, William b„ Universl^ of North Carolina 

mnsoH DONAi© life toyrance Agency Management Assoclatlpn 

wmnm, jy€N, Balttmore City (Maryland) Public Schpols 

pipisQN, mimy m., Education^ Testing Service 

piTCHi^ BARBARA, Educattofial Testing Service 

pd|M€^ NORMAN New York State Civil Service Department 

Fooi^ RICHARD t., Syracuse Unlversl^^^ 

pooLB^ MARY Ho Erie (Penniylvanla) School District ^ ^ 

FRUZO^ FRAN^ UAlVCTil^ of WliCODSln 

Hiuzi!^ ROBERT, University of Wisconsin - 

FURCiUt WILLIAM D,, Summlt(New Jersey) Public Schools 

PURDY^ RQBBiT D„ Syracuse City (New York) School District 

pu^io, ALBERT, Philadelphia Personnel Department 

qpimif JOHN s., JR., Harcotfrts Brace and World, Inc? 

RA, j\mo BAY, Harcourt, Brace and World, Inc. 

RABiNOwra.wiuiAM, The City University of New \ 

RAHDotJH LAWRiNGi, Tenafly (New Jersey) High School 

RApHA*^ . BRQTKm ALOYsi^ rs.c, Blshop Loughlin Memorial High 

Schooli Brooklyn 
RAPPARLiEj JOHN H., The OwensJlllnols Glass Company. 
RiAD, THOMAS, Hampton Roads Academy, Newport News, Virginia 
REEBE^ MARY K=, Educatlonal Testing Service 
mmi BERNARD A,, Trcnton State College 
REED, RivmEND LORENZO K., s, J,, Jesuit Educational AssoclaUon 
REELING GLENN E=, Montclalr ( Nuw Jefscy ) Public Schools 
RiiDj CATHERINE F,, Hunter College 
REiD, JOHN Wm Indlaha State College, Pennsylvania 
REiLLV, JAMI^ J., St, John's University " 

REiis, J^N r.i Educational Testing Servict- ^ 
REUT^, wiLLiA,M Hm Educational Testing Service 

REYNOLD^ >HARLAN J.* International Business Machines Corporation 
RHODES, DORIS L,, Educational Testing Service , 
RHUf^ CORDON J,, State College of Iowa 

RiGHARDiN^ SISTER MARv/ B.v,M., National CatlVolic Educational Asso- 
ciation V 

RrcHAKDS, JAM^ M,, Educatlonal Ttsting Service 
' RicK% jAMEi H^ jr!. The Psychological Corporation 

ROBERT^ ft tragey; Pennsylvania State Civil Service Commission 
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ROBiNsCiN^ DONALD Phi Dfhii Kappa 

Bracfran4 World. Inc. 
RomBAua^ mANGs G., Bdiicatidnml Teating Service 
ROHRBAUSi^ ]^mi vf.i.Bmmbury Senior High School, Yardley, P^n- 
sylvanla ' - = 

ROMBIR^ THOMAS, National Longitudinal Study of Mathematical Abili^ 
ROsiBoRQUOi^ HOWARD, McGill Univeriity 
Rostr^ juLiu^ York City Public Schools ^ 

llbss, WEiLEY F^j Univeriity of Kentucky \ 

ROSIER ppHALD Ncw Jcrsey Education Asaoclatlon 

noyiL^D, wiLMiNA, The United Preiby^iefian Church, Board of Christian 

j Education 

iAOHi^ lORRAiNi P., National League for Nvirsing 
SANAZARC^ PAUL J\, Association of Atiierican Medical Colleges 
sANBORNj MARiMALL Pi, University of:Wiicoiisirt 

sANFORt^ RUTH c, West Hempstead (New York) Junior-Senior High 
School 

.a«M^iiMA^JViAaUi Education aljje^ . 

iAsLow> MAX s.. New York City Department of Personnel 

SCHEiDE^ ROSE M,, Educational Testing Service 

SCHLEKAT, GEORGE A,, Educational Testing Service 

iCHNiiDER, HARRin^ L,j National League for Nursing 

sci^iTZEy^ joiEPH p.. University of Houston 

scHRAPER, WILLIAM B., Educutlonal Testing Service 

scHULTi, CHARLo*B.j Educational Testing Service 

SCHULt^, DOUGLAS, Applied Psychological Services 

jCHULEi DELPHiN L.j The Lutheran Church-Missouri Synod 

SCHUMACHER CHARLis,F.| National Board of Medical Examiners 

i€HWARTEMANj ALEX E., McGill University 

aeon ELD^ LEONARD, Dean Junior College 

SCOTT, a wiNFiMS, Rutgers, The State University 

SCOTT, MARY HUOHiE, National Education Association 

sCR^BNERj PETER c, Harcourt, Brace and Worlds Inc. 

gg^HOR^' HAROLD, The^Psychologlcal Corporation * ^ 

stiBEi^ DEAN w., Educational Testing Service 
.wi^.ALBSix«.Shak^ Heights (Ohio )^gh School 

SWAFINC^ ROBERT P., Educational Testing Service 

SERLiN^ ALBERT M., Educational Testing Service 
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imn, GHAROT j,s New York City Departmeiit of Personnd 
SFOR^ RICKED #/, Unitfd States Military Academy 

SHMP, GATHimiNi o., Educatlonal Teiang Service 
SHAVCOI^ MARION F., American Institute for Research ' 
SHEA, JAMM i.s New York State Department of Civil Service 
BHWmE^ MARCiA, University of Rochester 
sHnMAN, EDO^ M,, Irvlngtou (New Jersey) High School 
t„BiIiD|f .MARys Nf Upnal League for Nursing 
SHtit^^ a L., Jefferson County (Kentucky) Education Center 
iHiMBERQ BENJAMIN, Educational Testing Service 
siECi^ ARTHUR, Applied Psychologlcal Servlc|s^ 
szEoit, LAURENCii Miami University 
siLyiY, HffiBiRT M., State College of Iowa 
piMANDLp, SIDNEY* Kentucky State Department of Education 
ijOBiR^v^ENNART, Educallonal Testing Service 
,iKAGi^;te^W WM Educational Testing Service 
^ SLoi^ d^rw ff!', Pennsylva 

SMiij^: AiB^t p.,vUnited States Marine Corps 
;^>ljtk^ AtttAOTri^^^^^ Connecticut State College 

^ ^i^Hp ^kLLAW b;, Blu)de Island College ■ . 

i&^H|^^^*1^dy^tioiial T^^ ^ : ^ 

SMiTH^ JOHN W ^ucatU^naj^i^^^ 

SMITH, MARSHALL ^niOU^BtM^ Golloge 

SMITHS MARSHALL s , Harvard yniVersUy 
SMiTi^ ROBERt E., Educational Testing Service 
SMim VIRGINIA A., Ncw Yorl<Unlversity 
SNOOORAss, ROBERT, Purdue University 
SOLOMON, ROBERT j.^Educational Testing Service 
soMMER, JOHN, Houghtou Mifflin Company, 
SOUTHER, MARY T,, Tower Hill School, Wilmington, Delaware 
souTHWORTH J- ALFRED, UnlvcrHity of Massachusetts 
SPAIN, CLARENCE J,, Schenectady^ Ncw York) Public Schools 
SPEE^ GEORGE s., Illinois Institute of Technology * 
iPENCE, jAM^ R,, State University of New York at Albany 
SPENCER, RICHARD E,, Pennsylvania State University 
SPITZER, ROBERT L,, Biometrics Research 
gpRAou^ ARTHiJR R., Hunter Coll|ge 
sraEMULLi, WELLE E,, Educational Testing Service 
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e^uiiiES, JO Ai^E i.^ Ha^»pl0^' R%>^1ds.$^^^ Center, New- 
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STAM^ ROBERT E„ University ^iCJlllnbli V V' ' 

New York C%.fubiiq'S£Bdoti/ ; . 
sfATLSi^ CHARL@ R., State Ig^^i^M^W ;/ 

miNMANi ARTHim MiM,: "J^yti^^^^ " ^ _ 

STW^ oioRGi G., SyracuM^^^fity: > J ? / 
sTJQiN, JACK I*, New York City 

STEWART^ BLAIB, AsSQCi^'^ ;,^oH'tg€S^f ijh^ 
. STIWART, CLI^ORDTm UflilV^f jiy^^f Sout^ 
STIWART, 1. ILIZABITHi/TSi^^^ijWnit^ 

ST^ART^ N AOMi i . i GTaAfor^",: f^C^ . V 

STIC^ OEEN^ New Yprk^Gl^'':;^;.;;^;^ 
iTiCKILlj DAVID .W;5 ,Edl^l<^iip^lvt^ 
STI LIS* ORAGE EliiN/ Uriiy ' 

STOKER^ HOWARD W/rter^^^ ' ' ' 

STON^ PAUL t;, H^k^igtfqti • S ' 

sTOUCHTONj ROifcRTift'^y*^ Ci^hfe^tkyt 'State' Departmehl of ^Edu cation 
STRE I cnm, s AMpEi^" N^^' Xfitk C ' B card o f fi du catibn ' / ' 7 

'^STRiBULA^ Mr^lfAEtvlBduC^ , ^ . • f J ..- / 

^Ri CKER^ li^WREr^Gt' J fiHu g atioif al .Tes ting Service/ ' , , . , ^ 1 ' , * 
STU ARD i, M0Ni i^'OR ^ w i Qiicese d f M obi le- B ^rrh in gh am , A 1 at^^^^ -'h 

^Iruff KARD, ci^^CTdiir - rii versity of Mary I a rid * ' v . ■ ■ 
§ uPEa, ' DON Aiir t , Tead|^^. o llfege/ Colu m b i a U ri 1 v e rsuy 
i^SSMAN, LEAH^/NwarftftNew J^^^ Systm 

' suvA^ AtfiSftriM^nfarid StAle Cq^ \ ' ■ /■ ' '^^^i^^ 

swAJ^ BEv^tX BU^^l9'^jd^ St^t^ C^^R^^^**^^^^* ^ ' : 

SWA5IS0N- Ebw^o^di^Univ^ ' ,--''-V • v -V- v: 

: swi;nif6rd^' rtA^idis^.^ — . ^ ^ l^i r - ^ 7>jr' v 

TAYtOli'iifeVOE^i^v'E^ ' ' ; . ' ^ i! V 

TAYLO^ lVlXRViN;,^U " 

fjm^^0m^ Service Departnrent y - ^fi^^^-^y^'^: 

tiRRA^ JOsEf B i^v Educational Testing Service^ ^ ; v = 
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■ TOdMKOH »^™oND i,j EducaU^ 

*l^OM«PiK^ ROam L., Teachers GoUege, Columbia Univeriity 
raAxtf^ ARTHUifi i;, Education^ 

TBMmt MRi* ARTHUR, New York City / ^ . « 

TRDCUm lAURA Mi, Northern Valley Regions High School, pemareat. 
New Jm^ I 
^^^^lOG^i iKANQj^ Committee on Diagaostic Reading Toaia, Inc. 
^ TRI5M1H boNAtt A., Educational Tciting pervl^^ ;^ , 

TUGKi^ DONA^ K*, Northeaitern Univeralty 

TUCKi^ LEDYiWD R, University of Illindli ,^ 
TULL'k^ & EMERSON, Florida State^niversity ^ 
tURHBULi^ wtLLiAM w.. Educational Testing Service = - 0^ 

TWYFOAD, .LORAN jpi, Niw York State Department of Education 
' .TYOT, MATILDA, Slmpion School, WalUngfordj Connceticut 
tJMANi^ SHiLLiY, New'York City Board of Education 
UFSHAi^l^HAMAp^i^^^^ ; ■ 

yALLE% JOHN Mm Educational Testing Service , 
VAN AUSDAU, LEIGH, Sclencc Research Associates " , 

VAN HORN, MARY jANE, Baltimore County (Maryland) Schools 
v^iN^ Biv^f^v* Educational Testing Service 
/^i^cEqLiA, JOHN A., Camden City (New Jersey) Schools . v 

VON^^ei ERiGH A,, Concordia College . ; 

.V )ir^pJ|Ei BC4ND£riA Qm Harcourt, Brace ^nd WjorldV Inc. 

' WA^ik, «/Mui, Hloprpiburg State College, Pennsylvania 
WAGNER, miLi/mf West yirglnla University * 
WAHLGREN, HARDY L,,' State U^niversity of New York at Geiieseo 
WALDENj CARRIE H.y New York University . 
WALK^ HowARDj Mt Vcmon (Ncw York) Public Schools 
wAUa^ ROBERT N., Personnel Press, Princeton, New Jersey 
wAiiAcI wiMBURN L., The Psychological Corporation 
WALLACi^ PHILIP c, Wallach Associates* Inc. 
WALTHra, RECi% United States Department of State 
wAttHEw, JOHN K., Educational testing Service ^ ^ . \ 

WALTON, WESLEY w?. Educational Te^^ ' . ' 

Wawzman, HAL, New York State Departnient of Civil Service ^ I . 

wantman; MOREY J M Educational Testing Service 
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■■ .• : --^ ^ y^t.i^t ^ _ » ■ 

ABmit*, Vdluila County (Florida) Board of Public Inatrucaori 
jmMWMkm^ WKMirn^ £du€atenid Testing'St ^Ice 
WAntlNlb RiGHMiD w;, Educattonal 1 eitUig ^irvice 
wAfsoNj wiu^m Tht CpQper^ * * 
WiB% SAM c,, Emory tJnlversity 
irm^ HAROixi Su FYandico (Califoniia) SchdoU 
minn MAX, Brooklyn^CoU^e 
WEISS, joslFHi PolyteAnic j^dtute of Brqoklyn 

WESMAH, MRS* ALSUNDER o.. New York C : V ^. 

WHiT^ MimB^T L,, New York City Department of^Serjonnel ^ 

WHitiM|i(|' Jo^ 1., United States Department of Health, EHueS^^n 

and^felfare . ,t ' * ^ 

wrtiTiJ^ DiAN K.j Harvard University " 

wHiTNi¥| ALraED D., Life Insurance Ag^cy Management. AssoclaUon 
wiEt^Bi SOLOMON, New York City Department of personnel 
wltiY, ISABEL c, Pennsbury High School, Vardley, Pennsylvania 
wiLx^ MARoumtTE, Naw;'York Unl^cwity , ^' ' 

wiLK^ wALtn H.J New York Universitj^; , 
wiLLAR^ RICHARD w,, MissachusAts rnsiitutfe of J'edinpl^ 
Wt'iiAUM^ ptrsi'b'.,' Groton ( M assachusetts ) Bchaol . y . ''y K 
wtiXEMiH Louit p. s United States Army Personnel a - \ ;y , J 
WILLIAMS^ HENRY A;,k|R,, Pensacola ( Florida) Juniof Conegr..^^^^^^ ' > 
WILLIAMS, rooer k:, Morgan State College, Maryland . ; t 
wiLLiY* CLARENGE Nohvich University^^^^^^^^ . 

w I ij Qi^ MARILYN, Education al Testing Serv , ■ ■ 

wiNGC^ ALfR^ L., Virginia State Department of Education '■ • \ 

wiNiEWia^ casimWs>* United States Naval Examining Center, ; 
wiNTmBorroM, JOHN A., Ed If cation al Testing Service ^ 
wis^ HAROLD L,* Weste^fl Reserve University / 
woEHiKE, ARNOLD' 8, j Intern aubhal Business Machines eorporation ^ 
woL^ RICHARD* Unlvlrsity of Chicago v; ' : ^ 
WOOD, BEN D., Columbia University , A ■ 

wooLiATT, LORNE H., New York State Departtnent pf Education 
WRIGHT, WILBUR H.| State University of New Ydrk at Geneseo 
WRioHiSTON^ J. WAYNE, New York City Board of EducaUon 
YABLpNiKY, raANK D^i New York City 
YABLONiKY, iHiRLiY, New York University ^ ' 
YOXAU, GEORGE J., Inland Steel Company 
zACGARii^ LUCY Cm University of Illinois 4. 
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zAisiN^ suopOKi., The Qiiy CoUiga of New York < < ?^ 
ZAMG^ pAs^UALt jM PhUadilphia Personnel Dspartment 
im i^ Lid i N E B a e a tion , D ari enr^nneeHmi t - ^ _ 

iiMius» HBRinT^Bank Streat CoUi^^ . 
iOLA, eAm J , Niikayufta High School, Scheriectady, New York 
z^l^ DONOVAN 4m Board of Examiners for the Foreign Service 
zmnmMAUg hJWOLD, New York City Board of Education 
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